CN117973337A - Table reconstruction method, apparatus, electronic device and storage medium - Google Patents
Table reconstruction method, apparatus, electronic device and storage medium Download PDFInfo
- Publication number
- CN117973337A CN117973337A CN202410102694.3A CN202410102694A CN117973337A CN 117973337 A CN117973337 A CN 117973337A CN 202410102694 A CN202410102694 A CN 202410102694A CN 117973337 A CN117973337 A CN 117973337A
- Authority
- CN
- China
- Prior art keywords
- image
- cell
- feature
- candidate
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000012545 processing Methods 0.000 claims abstract description 25
- 238000012549 training Methods 0.000 claims description 24
- 230000009466 transformation Effects 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 9
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 description 10
- 238000004891 communication Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000012916 structural analysis Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/177—Editing, e.g. inserting or deleting of tables; using ruled lines
- G06F40/18—Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Geometry (AREA)
- Computer Graphics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a table reconstruction method, a table reconstruction device, electronic equipment and a storage medium, which are applied to the technical field of image processing. The method comprises the following steps: acquiring a form image; extracting image features of the table image, and determining cell categories, cell coordinates and cell pixel masks of the table image according to the image features; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table; the cell categories comprise blank cells, basic cells and merging cells.
Description
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a table reconstruction method, apparatus, electronic device, and storage medium.
Background
The form recognition technology is to automatically parse a form region contained in an image into a structured form by using a computer system and store the structured form. The form identification technology can quickly and effectively help people identify and understand form contents in the image, and can quickly analyze the form in the image into a computer readable format so as to facilitate electronic storage and subsequent content analysis of the form contents.
In the prior art, the structure identification branch and the content identification branch of the table image are separated, and then the structural analysis of the table content is realized through merging processing.
However, since the structure recognition branches and the content recognition branches have a difference in modality information, the content recognition branches lack an interdependence relationship between structures, and thus the recognition performance is poor.
Disclosure of Invention
The invention provides a table reconstruction method, a table reconstruction device, electronic equipment and a storage medium, which are used for solving the problem of poor recognition performance of the content recognition branch lack of the structure of the table recognition technology in the prior art.
The invention provides a table reconstruction method, which comprises the following steps: acquiring a form image; extracting image features of the table image, and determining cell categories, cell coordinates and cell pixel masks of the table image according to the image features; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table; the cell categories comprise blank cells, basic cells and merging cells.
According to the present invention, there is provided a table reconstruction method, wherein the determining a cell category, a cell coordinate and a cell pixel mask of the table image according to the image feature includes: determining a first candidate frame and a first candidate feature; predicting a target object feature from the image feature, the first candidate box, and the first candidate feature; and obtaining the cell category, the cell coordinates and the cell pixel mask by decoding the target object feature.
According to the present invention, there is provided a table reconstruction method, wherein the determining a target object feature according to the image feature, the first candidate frame and the first candidate feature includes: performing multi-head self-attention transformation on the first candidate feature to obtain a second candidate feature; performing frame region interest alignment on the first candidate frame and the image feature to obtain a first frame feature; performing dynamic convolution module enhancement on the second candidate features and the first frame features to obtain second frame features, determining a second candidate frame based on the second frame features, and updating the first candidate frame to be the second candidate frame; performing mask region interest alignment on the second frame feature and the image feature to obtain a first pixel mask; and carrying out dynamic convolution module enhancement on the second candidate feature and the first pixel mask to obtain the target object feature.
According to the present invention, there is provided a form reconstruction method, wherein the obtaining a form image includes: performing size transformation processing on the image to be processed according to a preset size; and identifying a table position in the image to be processed, and dividing the table image from the image to be processed according to the table position.
According to the present invention, there is provided a table reconstruction method, wherein the extracting the image features of the table image includes: and determining the characteristic expression of the table image through a convolutional neural network, and determining the multi-scale characteristic expression of the characteristic expression through a characteristic pyramid network to obtain the image characteristic.
According to the present invention, before the table image is acquired, the method further includes: acquiring training data, wherein the training data comprises a training image and an image label, and the image label comprises a category label, a coordinate label and a pixel label; inputting the training image into a table recognition model to obtain a first cell category, a first cell coordinate and a first cell pixel mask; determining a first loss according to the first cell category and the category label, determining a second loss according to the first cell coordinates and the coordinate label, and determining a third loss according to the first cell pixel mask and the pixel label; weighting and summing the first loss, the second loss and the third loss based on a predefined weight coefficient to obtain a target loss; and updating the model parameters of the table identification model according to the target loss.
According to the present invention, there is provided a table reconstruction method, wherein the extracting image features of the table image and determining cell categories, cell coordinates and cell pixel masks of the table image according to the image features includes: and extracting image features of the table image through the table identification model, and predicting cell categories, cell coordinates and cell pixel masks of the table image according to the image features.
The invention also provides a table reconstruction device, which comprises: the device comprises an acquisition module and a processing module; the acquisition module is used for acquiring a form image; the processing module is used for extracting the image characteristics of the table image and determining the cell category, the cell coordinates and the cell pixel mask of the table image according to the image characteristics; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table; the cell categories comprise blank cells, basic cells and merging cells.
According to the present invention, there is provided a table reconstruction device, wherein the processing module is configured to: determining a first candidate frame and a first candidate feature; predicting a target object feature from the image feature, the first candidate box, and the first candidate feature; and obtaining the cell category, the cell coordinates and the cell pixel mask by decoding the target object feature.
According to the present invention, there is provided a table reconstruction device, wherein the processing module is configured to: performing multi-head self-attention transformation on the first candidate feature to obtain a second candidate feature; performing frame region interest alignment on the first candidate frame and the image feature to obtain a first frame feature; performing dynamic convolution module enhancement on the second candidate features and the first frame features to obtain second frame features, determining a second candidate frame based on the second frame features, and updating the first candidate frame to be the second candidate frame; performing mask region interest alignment on the second frame feature and the image feature to obtain a first pixel mask; and carrying out dynamic convolution module enhancement on the second candidate feature and the first pixel mask to obtain the target object feature.
According to the invention, there is provided a table reconstruction device, wherein the acquisition module is used for: performing size transformation processing on the image to be processed according to a preset size; and identifying a table position in the image to be processed, and dividing the table image from the image to be processed according to the table position.
According to the present invention, there is provided a table reconstruction device, wherein the processing module is configured to: and determining the characteristic expression of the table image through a convolutional neural network, and determining the multi-scale characteristic expression of the characteristic expression through a characteristic pyramid network to obtain the image characteristic.
According to the invention, there is provided a table reconstruction device, wherein the acquisition module is used for: acquiring training data, wherein the training data comprises a training image and an image label, and the image label comprises a category label, a coordinate label and a pixel label; the processing module is used for: inputting the training image into a table recognition model to obtain a first cell category, a first cell coordinate and a first cell pixel mask; determining a first loss according to the first cell category and the category label, determining a second loss according to the first cell coordinates and the coordinate label, and determining a third loss according to the first cell pixel mask and the pixel label; weighting and summing the first loss, the second loss and the third loss based on a predefined weight coefficient to obtain a target loss; and updating the model parameters of the table identification model according to the target loss.
According to the present invention, there is provided a table reconstruction device, wherein the processing module is configured to: and extracting image features of the table image through the table identification model, and predicting cell categories, cell coordinates and cell pixel masks of the table image according to the image features.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the table reconstruction method as described in any one of the above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a table reconstruction method as described in any of the above.
The table reconstruction method, the table reconstruction device, the electronic equipment and the storage medium can acquire a table image; extracting image features of the table image, and determining cell categories, cell coordinates and cell pixel masks of the table image according to the image features; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table; the cell categories comprise blank cells, basic cells and merging cells. According to the scheme, the cell type, the cell coordinate and the cell pixel mask of the table image can be determined according to the image characteristics, and the second table is reconstructed according to the cell type, the cell coordinate and the cell pixel mask.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a table reconstruction method according to the present invention;
FIG. 2 is a second flow chart of the table rebuilding method according to the present invention;
FIG. 3 is a third flow chart of a table rebuilding method according to the present invention;
FIG. 4 is a schematic diagram of a table reconstruction device according to the present invention;
Fig. 5 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, in the embodiments of the present invention, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present invention is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
In order to clearly describe the technical solution of the embodiment of the present invention, in the embodiment of the present invention, the words "first", "second", etc. are used to distinguish identical items or similar items having substantially the same function and effect, and those skilled in the art will understand that the words "first", "second", etc. are not limited in number and execution order.
Embodiments of the invention some exemplary embodiments have been described for illustrative purposes, it being understood that the invention may be practiced otherwise than as specifically shown in the accompanying drawings.
The foregoing implementations are described in detail below with reference to specific embodiments and accompanying drawings.
As shown in fig. 1, an embodiment of the present invention provides a table reconstruction method, which can be applied to a table reconstruction device. The table reconstruction method may include S101-S103:
s101, a table reconstruction device acquires a table image.
Wherein, the form image is an image containing form elements.
Optionally, the table reconstruction device may perform size transformation processing on the image to be processed according to a preset size; and identifying a table position in the image to be processed, and dividing the table image from the image to be processed according to the table position.
Specifically, the table reconstruction device may perform a preprocessing operation on the input image, where the preprocessing operation includes first adjusting the size of the input image to a preset size (img W*imgH) by using an interpolation algorithm, where img W represents an image width, img H represents an image height, then detecting a table position in the input image by using a table image detection algorithm, and finally dividing the image containing the table according to the table position to obtain a table image, where the table image may be represented as a matrix (img W*imgH*imgC), and img C represents the number of image channels.
S102, a table reconstruction device extracts image features of the table image and determines cell categories, cell coordinates and cell pixel masks of the table image according to the image features.
The cell categories include blank cells, basic cells and merging cells.
Optionally, the table reconstruction device may determine a feature expression of the table image through a convolutional neural network, and determine a multi-scale feature representation of the feature expression through a feature pyramid network, so as to obtain the image feature.
Specifically, the table reconstruction device may perform multiplexing operations such as convolution, pooling, residual connection, activation, etc. through the convolutional neural network, so as to obtain a feature expression of the table image, then, implement multi-scale feature expression of the table image through the feature pyramid network, and finally, obtain an image feature of the table image, where the image feature may be expressed as a matrix W img*Himg*Cimg.
Optionally, the table reconstruction means may determine a first candidate box and a first candidate feature; predicting a target object feature from the image feature, the first candidate box, and the first candidate feature; and obtaining the cell category, the cell coordinates and the cell pixel mask by decoding the target object feature.
Specifically, the table reconstruction device may perform feature encoding on the input image features by using an encoder group, and then decode according to different targets by using a decoder group. First, the table reconstruction means may randomly initialize the first candidate frame and the first candidate feature that are learnable, wherein the number of the first candidate frame and the first candidate feature may be set to 300, the dimension of the first candidate frame may be set to 4, and the dimension of the first candidate feature may be set to 256. And then, mutually restricting the candidate features and the candidate frames through a dynamic convolution module, and encoding to obtain the final object features. The method specifically comprises the following steps: the table reconstruction device can perform multi-head self-attention transformation on the first candidate feature to obtain a second candidate feature; and carrying out frame region interest alignment on the first candidate frame and the image features by using a bilinear interpolation algorithm to obtain first frame features, so that the frame features can be obtained more accurately to improve the sensitivity to small regions. Then, carrying out dynamic convolution module enhancement on the second candidate features and the first frame features to obtain second frame features, determining a second candidate frame based on the second frame features, and updating the first candidate frame into the second candidate frame; then, mask region interest alignment is carried out on the second frame features and the image features, and a first pixel mask is obtained; and carrying out dynamic convolution module enhancement on the second candidate feature and the first pixel mask to obtain a target object feature. After the target object feature is obtained, the table reconstruction device may obtain the cell category by a classification decoder, the cell coordinates by a bounding box decoder, and the cell pixel mask by a pixel mask decoder.
Optionally, after obtaining the second frame feature, a candidate frame decoder may classify and regress the second frame feature to obtain a second candidate frame, and replace the first candidate frame with the second candidate frame, so as to implement updating of the first candidate frame.
Alternatively, the classification decoder may include 5 linear layers, the bounding box decoder may include 3 linear layers, and the pixel mask decoder may include four consecutive convolution layers for capturing the input hierarchical features, one deconvolution layer for upsampling the spatial resolution of the image features, and one 1×1 convolution layer for reducing the number of channels in the image features, resulting in the current stage of mask prediction.
Optionally, before acquiring the form image, the form reconstruction device may acquire training data, where the training data includes a training image and an image tag, and the image tag includes a category tag, a coordinate tag, and a pixel tag; inputting the training image into a table recognition model to obtain a first cell category, a first cell coordinate and a first cell pixel mask; determining a first loss according to the first cell category and the category label, determining a second loss according to the first cell coordinates and the coordinate label, and determining a third loss according to the first cell pixel mask and the pixel label; weighting and summing the first loss, the second loss and the third loss based on a predefined weight coefficient to obtain a target loss; model parameters of the tabular identification model are updated in accordance with the target loss using AdamW optimization algorithm.
Specifically, as shown in fig. 2, after the training image is input into the table recognition model, the table recognition model may firstly perform image feature extraction on the training image, then perform feature encoding through the encoder set, then perform feature decoding through the decoder set, then calculate the target loss based on the decoding result and the image tag, and finally implement gradient update of the table recognition model through the target loss.
Alternatively, as shown in fig. 3, the table reconstructing apparatus may extract image features of the table image through the above-mentioned table recognition model, and predict cell categories, cell coordinates, and cell pixel masks of the table image according to the image features.
And S103, the table reconstruction device reconstructs grid lines according to the cell coordinates and the cell pixel mask to obtain a first table, and performs cell merging on the first table according to the cell category to obtain a second table.
Specifically, with continued reference to fig. 3, for the output cell coordinate set, the table reconstruction device may filter out the cell coordinates with confidence level lower than the preset threshold through the maximum suppression algorithm, then reconstruct virtual grid lines based on the filtered cell coordinate set and the cell pixel mask area to obtain a first table through the set horizontal line threshold and the set vertical line threshold, merge cells of the first table according to the output cell type, generate a logical structure of the table according to the merged cell set, and backfill cell contents to obtain a reconstructed second table.
The embodiment of the invention can fuse the interdependence relationship among the cells in the training process, can simultaneously utilize the information between the structure and the content, has shared information and shared partial model parameters among different cells, and can improve the training effect of the model; in the testing and reasoning process, only the regression frame fused with the cell layout and the content information is required to be inferred, the complexity of the required model storage space is small, in the testing stage, the model is directly converted into a structured sequence from the image containing the table, the required model decoding time is greatly reduced, and the recognition performance of the table image recognition architecture can be effectively improved in terms of quality and efficiency.
In the embodiment of the invention, the cell type, the cell coordinate and the cell pixel mask of the table image can be determined according to the image characteristics, and the second table is reconstructed according to the cell type, the cell coordinate and the cell pixel mask.
The foregoing description of the solution provided by the embodiments of the present invention has been mainly presented in terms of a method. To achieve the above functions, it includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the table reconstruction method provided by the embodiment of the invention, the execution body may be a table reconstruction device or a control module for table reconstruction in the table reconstruction device. In the embodiment of the present invention, a table reconstruction method performed by a table reconstruction device is taken as an example, and the table reconstruction device provided by the embodiment of the present invention is described.
It should be noted that, in the embodiment of the present invention, the table reconstruction device may be divided into functional modules according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated modules may be implemented in hardware or in software functional modules. Optionally, the division of the modules in the embodiment of the present invention is schematic, which is merely a logic function division, and other division manners may be implemented in practice.
As shown in fig. 4, an embodiment of the present invention provides a table reconstruction apparatus 400. The table reconstruction device 400 includes: an acquisition module 401 and a processing module 402. The acquiring module 401 is configured to acquire a table image; the processing module 402 is configured to extract image features of the table image, and determine a cell category, a cell coordinate, and a cell pixel mask of the table image according to the image features; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table; the cell categories comprise blank cells, basic cells and merging cells.
Optionally, the processing module 402 is configured to: determining a first candidate frame and a first candidate feature; predicting a target object feature from the image feature, the first candidate box, and the first candidate feature; and obtaining the cell category, the cell coordinates and the cell pixel mask by decoding the target object feature.
Optionally, the processing module 402 is configured to: performing multi-head self-attention transformation on the first candidate feature to obtain a second candidate feature; performing frame region interest alignment on the first candidate frame and the image feature to obtain a first frame feature; performing dynamic convolution module enhancement on the second candidate features and the first frame features to obtain second frame features, determining a second candidate frame based on the second frame features, and updating the first candidate frame to be the second candidate frame; performing mask region interest alignment on the second frame feature and the image feature to obtain a first pixel mask; and carrying out dynamic convolution module enhancement on the second candidate feature and the first pixel mask to obtain the target object feature.
Optionally, the obtaining module 401 is configured to: performing size transformation processing on the image to be processed according to a preset size; and identifying a table position in the image to be processed, and dividing the table image from the image to be processed according to the table position.
Optionally, the processing module 402 is configured to: and determining the characteristic expression of the table image through a convolutional neural network, and determining the multi-scale characteristic expression of the characteristic expression through a characteristic pyramid network to obtain the image characteristic.
Optionally, the obtaining module 401 is configured to: acquiring training data, wherein the training data comprises a training image and an image label, and the image label comprises a category label, a coordinate label and a pixel label; the processing module 402 is configured to: inputting the training image into a table recognition model to obtain a first cell category, a first cell coordinate and a first cell pixel mask; determining a first loss according to the first cell category and the category label, determining a second loss according to the first cell coordinates and the coordinate label, and determining a third loss according to the first cell pixel mask and the pixel label; weighting and summing the first loss, the second loss and the third loss based on a predefined weight coefficient to obtain a target loss; and updating the model parameters of the table identification model according to the target loss.
Optionally, the processing module 402 is configured to: and extracting image features of the table image through the table identification model, and predicting cell categories, cell coordinates and cell pixel masks of the table image according to the image features.
In the embodiment of the invention, the cell type, the cell coordinate and the cell pixel mask of the table image can be determined according to the image characteristics, and the second table is reconstructed according to the cell type, the cell coordinate and the cell pixel mask.
Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, which may include: processor 510, communication interface (Communications Interface) 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, memory 530 complete communication with each other through communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a table reconstruction method comprising: acquiring a form image; extracting image features of the table image, and determining cell categories, cell coordinates and cell pixel masks of the table image according to the image features; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table; the cell categories comprise blank cells, basic cells and merging cells.
Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform a table reconstruction method provided by the above methods, the method comprising: acquiring a form image; extracting image features of the table image, and determining cell categories, cell coordinates and cell pixel masks of the table image according to the image features; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table; the cell categories comprise blank cells, basic cells and merging cells.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the table reconstruction methods provided above, the method comprising: acquiring a form image; extracting image features of the table image, and determining cell categories, cell coordinates and cell pixel masks of the table image according to the image features; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table; the cell categories comprise blank cells, basic cells and merging cells.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. A method of table reconstruction, comprising:
Acquiring a form image;
extracting image features of the table image, and determining cell categories, cell coordinates and cell pixel masks of the table image according to the image features;
Carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table;
The cell categories comprise blank cells, basic cells and merging cells.
2. The method of claim 1, wherein determining the cell class, cell coordinates, and cell pixel mask of the table image from the image features comprises:
Determining a first candidate frame and a first candidate feature;
Predicting a target object feature from the image feature, the first candidate box, and the first candidate feature;
And obtaining the cell category, the cell coordinates and the cell pixel mask by decoding the target object feature.
3. The method of table reconstruction according to claim 2, wherein the determining a target object feature from the image feature, the first candidate box, and the first candidate feature comprises:
performing multi-head self-attention transformation on the first candidate feature to obtain a second candidate feature;
Performing frame region interest alignment on the first candidate frame and the image feature to obtain a first frame feature;
Performing dynamic convolution module enhancement on the second candidate features and the first frame features to obtain second frame features, determining a second candidate frame based on the second frame features, and updating the first candidate frame to be the second candidate frame;
Performing mask region interest alignment on the second frame feature and the image feature to obtain a first pixel mask;
and carrying out dynamic convolution module enhancement on the second candidate feature and the first pixel mask to obtain the target object feature.
4. The form reconstruction method according to claim 1, wherein the acquiring the form image includes:
Performing size transformation processing on the image to be processed according to a preset size;
And identifying a table position in the image to be processed, and dividing the table image from the image to be processed according to the table position.
5. The method of claim 1, wherein the extracting image features of the form image comprises:
And determining the characteristic expression of the table image through a convolutional neural network, and determining the multi-scale characteristic expression of the characteristic expression through a characteristic pyramid network to obtain the image characteristic.
6. The method of claim 1-5, wherein prior to the acquiring the form image, the method further comprises:
Acquiring training data, wherein the training data comprises a training image and an image label, and the image label comprises a category label, a coordinate label and a pixel label;
inputting the training image into a table recognition model to obtain a first cell category, a first cell coordinate and a first cell pixel mask;
Determining a first loss according to the first cell category and the category label, determining a second loss according to the first cell coordinates and the coordinate label, and determining a third loss according to the first cell pixel mask and the pixel label;
weighting and summing the first loss, the second loss and the third loss based on a predefined weight coefficient to obtain a target loss;
And updating the model parameters of the table identification model according to the target loss.
7. The method of claim 6, wherein extracting the image features of the table image and determining the cell category, the cell coordinates, and the cell pixel mask of the table image based on the image features comprises:
And extracting image features of the table image through the table identification model, and predicting cell categories, cell coordinates and cell pixel masks of the table image according to the image features.
8. A form reconstruction apparatus, comprising: the device comprises an acquisition module and a processing module;
the acquisition module is used for acquiring a form image;
The processing module is used for extracting the image characteristics of the table image and determining the cell category, the cell coordinates and the cell pixel mask of the table image according to the image characteristics; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table;
The cell categories comprise blank cells, basic cells and merging cells.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the table reconstruction method according to any one of claims 1 to 7 when the program is executed by the processor.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the steps in the table reconstruction method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410102694.3A CN117973337B (en) | 2024-01-24 | 2024-01-24 | Table reconstruction method, apparatus, electronic device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410102694.3A CN117973337B (en) | 2024-01-24 | 2024-01-24 | Table reconstruction method, apparatus, electronic device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117973337A true CN117973337A (en) | 2024-05-03 |
CN117973337B CN117973337B (en) | 2024-10-11 |
Family
ID=90856737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410102694.3A Active CN117973337B (en) | 2024-01-24 | 2024-01-24 | Table reconstruction method, apparatus, electronic device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117973337B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190266394A1 (en) * | 2018-02-26 | 2019-08-29 | Abc Fintech Co., Ltd. | Method and device for parsing table in document image |
CN111932545A (en) * | 2020-07-14 | 2020-11-13 | 浙江大华技术股份有限公司 | Image processing method, target counting method and related device thereof |
CN113297975A (en) * | 2021-05-25 | 2021-08-24 | 新东方教育科技集团有限公司 | Method and device for identifying table structure, storage medium and electronic equipment |
CN114529925A (en) * | 2022-04-22 | 2022-05-24 | 华南理工大学 | Method for identifying table structure of whole line table |
CN115331245A (en) * | 2022-10-12 | 2022-11-11 | 中南民族大学 | Table structure identification method based on image instance segmentation |
CN115761773A (en) * | 2022-11-17 | 2023-03-07 | 上海交通大学 | Deep learning-based in-image table identification method and system |
WO2023134447A1 (en) * | 2022-01-12 | 2023-07-20 | 华为技术有限公司 | Data processing method and related device |
-
2024
- 2024-01-24 CN CN202410102694.3A patent/CN117973337B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190266394A1 (en) * | 2018-02-26 | 2019-08-29 | Abc Fintech Co., Ltd. | Method and device for parsing table in document image |
CN111932545A (en) * | 2020-07-14 | 2020-11-13 | 浙江大华技术股份有限公司 | Image processing method, target counting method and related device thereof |
CN113297975A (en) * | 2021-05-25 | 2021-08-24 | 新东方教育科技集团有限公司 | Method and device for identifying table structure, storage medium and electronic equipment |
WO2023134447A1 (en) * | 2022-01-12 | 2023-07-20 | 华为技术有限公司 | Data processing method and related device |
CN114529925A (en) * | 2022-04-22 | 2022-05-24 | 华南理工大学 | Method for identifying table structure of whole line table |
CN115331245A (en) * | 2022-10-12 | 2022-11-11 | 中南民族大学 | Table structure identification method based on image instance segmentation |
CN115761773A (en) * | 2022-11-17 | 2023-03-07 | 上海交通大学 | Deep learning-based in-image table identification method and system |
Also Published As
Publication number | Publication date |
---|---|
CN117973337B (en) | 2024-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3916635B1 (en) | Defect detection method and apparatus | |
CN110428428B (en) | Image semantic segmentation method, electronic equipment and readable storage medium | |
JP7373042B2 (en) | Brain function registration method based on graph model | |
CN111784673B (en) | Defect detection model training and defect detection method, device and storage medium | |
CN109615614B (en) | Method for extracting blood vessels in fundus image based on multi-feature fusion and electronic equipment | |
CN113688862B (en) | Brain image classification method based on semi-supervised federal learning and terminal equipment | |
CN110807463B (en) | Image segmentation method and device, computer equipment and storage medium | |
CN112906794A (en) | Target detection method, device, storage medium and terminal | |
CN116433914A (en) | Two-dimensional medical image segmentation method and system | |
CN113449787B (en) | Chinese character stroke structure-based font library completion method and system | |
CN113762265A (en) | Pneumonia classification and segmentation method and system | |
CN116071309A (en) | Method, device, equipment and storage medium for detecting sound scanning defect of component | |
CN113554655B (en) | Optical remote sensing image segmentation method and device based on multi-feature enhancement | |
CN114708465A (en) | Image classification method and device, electronic equipment and storage medium | |
CN117474796B (en) | Image generation method, device, equipment and computer readable storage medium | |
Wang et al. | Multi-focus image fusion framework based on transformer and feedback mechanism | |
CN117973337B (en) | Table reconstruction method, apparatus, electronic device and storage medium | |
CN117710295A (en) | Image processing method, device, apparatus, medium, and program product | |
CN111724309B (en) | Image processing method and device, training method of neural network and storage medium | |
CN116778016A (en) | MRI image reconstruction method, system and medium | |
CN116503677A (en) | Wetland classification information extraction method, system, electronic equipment and storage medium | |
CN114708353B (en) | Image reconstruction method and device, electronic equipment and storage medium | |
CN112232102A (en) | Building target identification method and system based on deep neural network and multitask learning | |
CN113096142A (en) | White matter nerve tract automatic segmentation method based on joint embedding space | |
CN117765531B (en) | Cell image sample enhancement method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |