CN117973337A - Table reconstruction method, device, electronic device and storage medium - Google Patents

Table reconstruction method, device, electronic device and storage medium Download PDF

Info

Publication number
CN117973337A
CN117973337A CN202410102694.3A CN202410102694A CN117973337A CN 117973337 A CN117973337 A CN 117973337A CN 202410102694 A CN202410102694 A CN 202410102694A CN 117973337 A CN117973337 A CN 117973337A
Authority
CN
China
Prior art keywords
image
cell
feature
candidate
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410102694.3A
Other languages
Chinese (zh)
Other versions
CN117973337B (en
Inventor
张亚萍
庞刘成
赵阳
周玉
宗成庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202410102694.3A priority Critical patent/CN117973337B/en
Publication of CN117973337A publication Critical patent/CN117973337A/en
Application granted granted Critical
Publication of CN117973337B publication Critical patent/CN117973337B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a table reconstruction method, a table reconstruction device, electronic equipment and a storage medium, which are applied to the technical field of image processing. The method comprises the following steps: acquiring a form image; extracting image features of the table image, and determining cell categories, cell coordinates and cell pixel masks of the table image according to the image features; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table; the cell categories comprise blank cells, basic cells and merging cells.

Description

表格重建方法、装置、电子设备及存储介质Table reconstruction method, device, electronic device and storage medium

技术领域Technical Field

本发明涉及图像处理技术领域,尤其涉及一种表格重建方法、装置、电子设备及存储介质。The present invention relates to the field of image processing technology, and in particular to a table reconstruction method, device, electronic equipment and storage medium.

背景技术Background technique

表格识别技术是指利用计算机系统自动将图像中包含的表格区域解析为结构化表格并存储。表格识别技术能够快速、有效地帮助人们识别和理解图像中的表格内容、可以将图像中的表格快速地解析为计算机可读的格式,以促进表格内容电子化存储和后续内容分析。Table recognition technology refers to the use of computer systems to automatically parse the table area contained in an image into a structured table and store it. Table recognition technology can quickly and effectively help people recognize and understand the table content in an image, and can quickly parse the table in the image into a computer-readable format to facilitate the electronic storage of the table content and subsequent content analysis.

现有技术中的表格识别技术架构是先将表格图像的结构识别分支和内容识别分支分开来进行,再通过合并处理来实现表格内容的结构化解析。The table recognition technology architecture in the prior art is to first separate the structure recognition branch and the content recognition branch of the table image, and then realize the structured analysis of the table content through merging processing.

然而,由于结构识别分支和内容识别分支存在模态信息上的差异,因此,内容识别分支缺少结构之间的相互依赖关系,识别性能较差。However, due to the difference in modal information between the structure recognition branch and the content recognition branch, the content recognition branch lacks the interdependence between structures and has poor recognition performance.

发明内容Summary of the invention

本发明提供一种表格重建方法、装置、电子设备及存储介质,用以解决现有技术中表格识别技术的内容识别分支缺少结构之间的相互依赖关系,识别性能较差的问题。The present invention provides a table reconstruction method, device, electronic device and storage medium, which are used to solve the problem that content recognition branches of table recognition technology in the prior art lack interdependence between structures and have poor recognition performance.

本发明提供一种表格重建方法,包括:获取表格图像;提取所述表格图像的图像特征,并根据所述图像特征确定所述表格图像的单元格类别、单元格坐标以及单元格像素掩码;根据所述单元格坐标和所述单元格像素掩码进行网格线重建得到第一表格,根据所述单元格类别对所述第一表格进行单元格合并得到第二表格;其中,所述单元格类别包括空白单元格、基础单元格以及合并单元格。The present invention provides a table reconstruction method, comprising: acquiring a table image; extracting image features of the table image, and determining the cell category, cell coordinates and cell pixel mask of the table image according to the image features; reconstructing grid lines according to the cell coordinates and the cell pixel mask to obtain a first table, and merging cells of the first table according to the cell category to obtain a second table; wherein the cell category includes blank cells, basic cells and merged cells.

根据本发明提供一种的表格重建方法,所述根据所述图像特征确定所述表格图像的单元格类别、单元格坐标以及单元格像素掩码,包括:确定第一候选框和第一候选特征;根据所述图像特征、所述第一候选框和所述第一候选特征预测目标对象特征;通过解码所述目标对象特征得到所述单元格类别、所述单元格坐标以及所述单元格像素掩码。According to a table reconstruction method provided by the present invention, the cell category, cell coordinates and cell pixel mask of the table image are determined according to the image features, including: determining a first candidate box and a first candidate feature; predicting a target object feature according to the image features, the first candidate box and the first candidate feature; and obtaining the cell category, the cell coordinates and the cell pixel mask by decoding the target object feature.

根据本发明提供一种的表格重建方法,所述根据所述图像特征、所述第一候选框和所述第一候选特征确定目标对象特征,包括:对所述第一候选特征进行多头自注意力变换,得到第二候选特征;对所述第一候选框和所述图像特征进行框区域兴趣对齐,得到第一框特征;对所述第二候选特征和所述第一框特征进行动态卷积模组增强,得到第二框特征,并基于所述第二框特征确定第二候选框,以及将所述第一候选框更新为所述第二候选框;对所述第二框特征和所述图像特征进行掩码区域兴趣对齐,得到第一像素掩码;对所述第二候选特征和所述第一像素掩码进行动态卷积模组增强,得到所述目标对象特征。According to a table reconstruction method provided by the present invention, the target object feature is determined according to the image feature, the first candidate frame and the first candidate feature, including: performing a multi-head self-attention transformation on the first candidate feature to obtain a second candidate feature; performing frame area interest alignment on the first candidate frame and the image feature to obtain a first frame feature; performing dynamic convolution module enhancement on the second candidate feature and the first frame feature to obtain a second frame feature, and determining a second candidate frame based on the second frame feature, and updating the first candidate frame to the second candidate frame; performing mask area interest alignment on the second frame feature and the image feature to obtain a first pixel mask; performing dynamic convolution module enhancement on the second candidate feature and the first pixel mask to obtain the target object feature.

根据本发明提供一种的表格重建方法,所述获取表格图像,包括:按照预设尺寸将待处理图像进行尺寸变换处理;识别所述待处理图像中的表格位置,并根据所述表格位置从所述待处理图像中分割所述表格图像。According to a table reconstruction method provided by the present invention, the obtaining of the table image includes: resizing the image to be processed according to a preset size; identifying the position of the table in the image to be processed, and segmenting the table image from the image to be processed according to the position of the table.

根据本发明提供一种的表格重建方法,所述提取所述表格图像的图像特征,包括:通过卷积神经网络确定所述表格图像的特征表达,通过特征金字塔网络确定所述特征表达的多尺度特征表示,得到所述图像特征。According to a table reconstruction method provided by the present invention, the extraction of image features of the table image includes: determining the feature expression of the table image through a convolutional neural network, determining the multi-scale feature representation of the feature expression through a feature pyramid network, and obtaining the image features.

根据本发明提供一种的表格重建方法,所述获取表格图像之前,所述方法还包括:获取训练数据,所述训练数据包括训练图像和图像标签,所述图像标签包括类别标签、坐标标签和像素标签;将所述训练图像输入表格识别模型,得到第一单元格类别、第一单元格坐标以及第一单元格像素掩码;根据所述第一单元格类别和所述类别标签确定第一损失,根据所述第一单元格坐标和所述坐标标签确定第二损失,根据所述第一单元格像素掩码和所述像素标签确定第三损失;基于预定义权重系数对所述第一损失、所述第二损失和所述第三损失进行加权求和,得到目标损失;按照所述目标损失更新所述表格识别模型的模型参数。According to a table reconstruction method provided by the present invention, before obtaining the table image, the method further includes: obtaining training data, the training data including training images and image labels, the image labels including category labels, coordinate labels and pixel labels; inputting the training image into a table recognition model to obtain a first cell category, a first cell coordinate and a first cell pixel mask; determining a first loss according to the first cell category and the category label, determining a second loss according to the first cell coordinate and the coordinate label, and determining a third loss according to the first cell pixel mask and the pixel label; performing weighted summation of the first loss, the second loss and the third loss based on a predefined weight coefficient to obtain a target loss; and updating the model parameters of the table recognition model according to the target loss.

根据本发明提供一种的表格重建方法,所述提取所述表格图像的图像特征,并根据所述图像特征确定所述表格图像的单元格类别、单元格坐标以及单元格像素掩码,包括:通过所述表格识别模型提取所述表格图像的图像特征,并根据所述图像特征预测所述表格图像的单元格类别、单元格坐标以及单元格像素掩码。According to a table reconstruction method provided by the present invention, the step of extracting image features of the table image and determining the cell category, cell coordinates and cell pixel mask of the table image based on the image features comprises: extracting the image features of the table image through the table recognition model, and predicting the cell category, cell coordinates and cell pixel mask of the table image based on the image features.

本发明还提供一种表格重建装置,包括:获取模块和处理模块;所述获取模块,用于获取表格图像;所述处理模块,用于提取所述表格图像的图像特征,并根据所述图像特征确定所述表格图像的单元格类别、单元格坐标以及单元格像素掩码;根据所述单元格坐标和所述单元格像素掩码进行网格线重建得到第一表格,根据所述单元格类别对所述第一表格进行单元格合并得到第二表格;其中,所述单元格类别包括空白单元格、基础单元格以及合并单元格。The present invention also provides a table reconstruction device, comprising: an acquisition module and a processing module; the acquisition module is used to acquire a table image; the processing module is used to extract image features of the table image, and determine the cell category, cell coordinates and cell pixel mask of the table image according to the image features; grid line reconstruction is performed according to the cell coordinates and the cell pixel mask to obtain a first table, and cells of the first table are merged according to the cell category to obtain a second table; wherein the cell category includes blank cells, basic cells and merged cells.

根据本发明提供一种的表格重建装置,所述处理模块用于:确定第一候选框和第一候选特征;根据所述图像特征、所述第一候选框和所述第一候选特征预测目标对象特征;通过解码所述目标对象特征得到所述单元格类别、所述单元格坐标以及所述单元格像素掩码。According to a table reconstruction device provided by the present invention, the processing module is used to: determine a first candidate box and a first candidate feature; predict target object features based on the image features, the first candidate box and the first candidate features; and obtain the cell category, the cell coordinates and the cell pixel mask by decoding the target object features.

根据本发明提供一种的表格重建装置,所述处理模块用于:对所述第一候选特征进行多头自注意力变换,得到第二候选特征;对所述第一候选框和所述图像特征进行框区域兴趣对齐,得到第一框特征;对所述第二候选特征和所述第一框特征进行动态卷积模组增强,得到第二框特征,并基于所述第二框特征确定第二候选框,以及将所述第一候选框更新为所述第二候选框;对所述第二框特征和所述图像特征进行掩码区域兴趣对齐,得到第一像素掩码;对所述第二候选特征和所述第一像素掩码进行动态卷积模组增强,得到所述目标对象特征。According to a table reconstruction device provided by the present invention, the processing module is used to: perform a multi-head self-attention transformation on the first candidate feature to obtain a second candidate feature; perform frame area interest alignment on the first candidate frame and the image feature to obtain a first frame feature; perform dynamic convolution module enhancement on the second candidate feature and the first frame feature to obtain a second frame feature, and determine a second candidate frame based on the second frame feature, and update the first candidate frame to the second candidate frame; perform mask area interest alignment on the second frame feature and the image feature to obtain a first pixel mask; perform dynamic convolution module enhancement on the second candidate feature and the first pixel mask to obtain the target object feature.

根据本发明提供一种的表格重建装置,所述获取模块用于:按照预设尺寸将待处理图像进行尺寸变换处理;识别所述待处理图像中的表格位置,并根据所述表格位置从所述待处理图像中分割所述表格图像。According to a table reconstruction device provided by the present invention, the acquisition module is used to: resize the image to be processed according to a preset size; identify the position of the table in the image to be processed, and segment the table image from the image to be processed according to the table position.

根据本发明提供一种的表格重建装置,所述处理模块用于:通过卷积神经网络确定所述表格图像的特征表达,通过特征金字塔网络确定所述特征表达的多尺度特征表示,得到所述图像特征。According to a table reconstruction device provided by the present invention, the processing module is used to: determine the feature expression of the table image through a convolutional neural network, determine the multi-scale feature representation of the feature expression through a feature pyramid network, and obtain the image feature.

根据本发明提供一种的表格重建装置,所述获取模块用于:获取训练数据,所述训练数据包括训练图像和图像标签,所述图像标签包括类别标签、坐标标签和像素标签;所述处理模块用于:将所述训练图像输入表格识别模型,得到第一单元格类别、第一单元格坐标以及第一单元格像素掩码;根据所述第一单元格类别和所述类别标签确定第一损失,根据所述第一单元格坐标和所述坐标标签确定第二损失,根据所述第一单元格像素掩码和所述像素标签确定第三损失;基于预定义权重系数对所述第一损失、所述第二损失和所述第三损失进行加权求和,得到目标损失;按照所述目标损失更新所述表格识别模型的模型参数。According to a table reconstruction device provided by the present invention, the acquisition module is used to: acquire training data, the training data includes training images and image labels, the image labels include category labels, coordinate labels and pixel labels; the processing module is used to: input the training images into a table recognition model to obtain a first cell category, a first cell coordinate and a first cell pixel mask; determine a first loss according to the first cell category and the category label, determine a second loss according to the first cell coordinate and the coordinate label, and determine a third loss according to the first cell pixel mask and the pixel label; perform weighted summation of the first loss, the second loss and the third loss based on a predefined weight coefficient to obtain a target loss; and update the model parameters of the table recognition model according to the target loss.

根据本发明提供一种的表格重建装置,所述处理模块用于:通过所述表格识别模型提取所述表格图像的图像特征,并根据所述图像特征预测所述表格图像的单元格类别、单元格坐标以及单元格像素掩码。According to a table reconstruction device provided by the present invention, the processing module is used to: extract image features of the table image through the table recognition model, and predict the cell category, cell coordinates and cell pixel mask of the table image according to the image features.

本发明还提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述任一种所述表格重建方法的步骤。The present invention also provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the steps of any of the above table reconstruction methods are implemented.

本发明还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如上述任一种所述表格重建方法的步骤。The present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the steps of any of the table reconstruction methods described above are implemented.

本发明提供的表格重建方法、装置、电子设备及存储介质,可以获取表格图像;提取所述表格图像的图像特征,并根据所述图像特征确定所述表格图像的单元格类别、单元格坐标以及单元格像素掩码;根据所述单元格坐标和所述单元格像素掩码进行网格线重建得到第一表格,根据所述单元格类别对所述第一表格进行单元格合并得到第二表格;其中,所述单元格类别包括空白单元格、基础单元格以及合并单元格。通过该方案,可以根据图像特征确定表格图像的单元格类别、单元格坐标以及单元格像素掩码,并根据单元格类别、单元格坐标以及单元格像素掩码重建第二表格,由于单元格类别、单元格坐标以及单元格像素掩码可以融合单元格之间的相互依赖关系,因此可以同时利用结构和内容之间的信息实现表格的重建,从而提高识别性能。The table reconstruction method, device, electronic device and storage medium provided by the present invention can obtain a table image; extract the image features of the table image, and determine the cell category, cell coordinates and cell pixel mask of the table image according to the image features; reconstruct the grid lines according to the cell coordinates and the cell pixel mask to obtain a first table, and merge the cells of the first table according to the cell category to obtain a second table; wherein the cell category includes blank cells, basic cells and merged cells. Through this scheme, the cell category, cell coordinates and cell pixel mask of the table image can be determined according to the image features, and the second table can be reconstructed according to the cell category, cell coordinates and cell pixel mask. Since the cell category, cell coordinates and cell pixel mask can fuse the interdependence between cells, the information between the structure and the content can be used simultaneously to achieve table reconstruction, thereby improving recognition performance.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the present invention or the prior art, the following briefly introduces the drawings required for use in the embodiments or the description of the prior art. Obviously, the drawings described below are some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1是本发明提供的表格重建方法的流程示意图之一;FIG1 is a flow chart of a table reconstruction method provided by the present invention;

图2是本发明提供的表格重建方法的流程示意图之二;FIG2 is a second flow chart of the table reconstruction method provided by the present invention;

图3是本发明提供的表格重建方法的流程示意图之三;FIG3 is a third flow chart of the table reconstruction method provided by the present invention;

图4是本发明提供的表格重建装置的结构示意图;FIG4 is a schematic diagram of the structure of a table reconstruction device provided by the present invention;

图5是本发明提供的电子设备的结构示意图。FIG. 5 is a schematic diagram of the structure of an electronic device provided by the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明中的附图,对本发明中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solution and advantages of the present invention clearer, the technical solution of the present invention will be clearly and completely described below in conjunction with the drawings of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

需要说明的是,本发明实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本发明实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。It should be noted that, in the embodiments of the present invention, words such as "exemplary" or "for example" are used to indicate examples, illustrations or descriptions. Any embodiment or design scheme described as "exemplary" or "for example" in the embodiments of the present invention should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as "exemplary" or "for example" is intended to present related concepts in a specific way.

需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本发明实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。It should be noted that, in this article, the terms "comprise", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, an element defined by the sentence "comprises one..." does not exclude the presence of other identical elements in the process, method, article or device including the element. In addition, it should be pointed out that the scope of the method and device in the embodiment of the present invention is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved, for example, the described method may be performed in an order different from that described, and various steps may also be added, omitted, or combined. In addition, the features described with reference to certain examples may be combined in other examples.

为了便于清楚描述本发明实施例的技术方案,在本发明实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分,本领域技术人员可以理解“第一”、“第二”等字样并不是在对数量和执行次序进行限定。In order to clearly describe the technical solutions of the embodiments of the present invention, in the embodiments of the present invention, the words "first", "second", etc. are used to distinguish between the same items or similar items with basically the same functions and effects. Those skilled in the art can understand that the words "first", "second", etc. are not limiting the quantity and execution order.

本发明实施例为了阐释的目的而描述了一些示例性实施例,需要理解的是,本发明可通过附图中没有具体示出的其他方式来实现。The embodiments of the present invention describe some exemplary embodiments for the purpose of explanation. It should be understood that the present invention can be implemented in other ways that are not specifically shown in the drawings.

下面结合具体实施例和附图对上述实现方式进行详细的阐述。The above implementation is described in detail below with reference to specific embodiments and drawings.

如图1所示,本发明实施例提供一种表格重建方法,该表格重建方法可以应用于表格重建装置。该表格重建方法可以包括S101-S103:As shown in FIG1 , an embodiment of the present invention provides a table reconstruction method, which can be applied to a table reconstruction device. The table reconstruction method may include S101-S103:

S101、表格重建装置获取表格图像。S101. A table reconstruction device obtains a table image.

其中,上述表格图像为包含表格元素的图像。The above table image is an image containing table elements.

可选地,表格重建装置可以按照预设尺寸将待处理图像进行尺寸变换处理;识别所述待处理图像中的表格位置,并根据所述表格位置从所述待处理图像中分割所述表格图像。Optionally, the table reconstruction device may perform a size conversion process on the image to be processed according to a preset size; identify a table position in the image to be processed, and segment the table image from the image to be processed according to the table position.

具体地,表格重建装置可以对输入图像进行预处理操作,该预处理操作包括先利用插值算法将输入图像的尺寸调整到预设尺寸(imgW*imgH),其中,imgW表示图像宽度,imgH表示图像高度,然后通过表格图像检测算法检测输入图像中的表格位置,最后根据表格位置将包含表格的图像分割出来,得到表格图像,该表格图像可以表示为一个矩阵(imgW*imgH*imgC),其中,imgC表示图像通道数。Specifically, the table reconstruction device can perform a preprocessing operation on the input image, which preprocessing operation includes first using an interpolation algorithm to adjust the size of the input image to a preset size (img W *img H ), where img W represents the image width and img H represents the image height, and then detecting the position of the table in the input image through a table image detection algorithm, and finally segmenting the image containing the table according to the table position to obtain a table image, which can be represented as a matrix (img W *img H *img C ), where img C represents the number of image channels.

S102、表格重建装置提取所述表格图像的图像特征,并根据所述图像特征确定所述表格图像的单元格类别、单元格坐标以及单元格像素掩码。S102: The table reconstruction device extracts image features of the table image, and determines the cell category, cell coordinates, and cell pixel mask of the table image according to the image features.

其中,单元格类别包括空白单元格、基础单元格以及合并单元格。Among them, cell categories include blank cells, basic cells and merged cells.

可选地,表格重建装置可以通过卷积神经网络确定所述表格图像的特征表达,通过特征金字塔网络确定所述特征表达的多尺度特征表示,得到所述图像特征。Optionally, the table reconstruction device may determine the feature expression of the table image through a convolutional neural network, and determine the multi-scale feature representation of the feature expression through a feature pyramid network to obtain the image feature.

具体地,表格重建装置可以通过卷积神经网络进行卷积、池化、残差连接和激活等复用操作,从而得到表格图像的特征表达,然后,通过特征金字塔网络实现表格图像的特征表达的多尺度特征表示,最后得到表格图像的图像特征,该图像特征可以表示为一个矩阵Wimg*Himg*CimgSpecifically, the table reconstruction device can perform multiplexing operations such as convolution, pooling, residual connection and activation through a convolutional neural network to obtain the feature expression of the table image, and then realize the multi-scale feature representation of the feature expression of the table image through a feature pyramid network, and finally obtain the image feature of the table image, which can be expressed as a matrix W img *H img *C img .

可选地,表格重建装置可以确定第一候选框和第一候选特征;根据所述图像特征、所述第一候选框和所述第一候选特征预测目标对象特征;通过解码所述目标对象特征得到所述单元格类别、所述单元格坐标以及所述单元格像素掩码。Optionally, the table reconstruction device can determine a first candidate box and a first candidate feature; predict target object features based on the image features, the first candidate box and the first candidate feature; and obtain the cell category, the cell coordinates and the cell pixel mask by decoding the target object features.

具体地,表格重建装置可以利用编码器组对输入的图像特征进行特征编码,再利用解码器组根据不同目标进行解码。首先,表格重建装置可以随机初始化可学习的第一候选框和第一候选特征,其中第一候选框和第一候选特征的数量可以设置为300,第一候选框的维度可以设置为4,第一候选特征的维度可以设置为256。然后,通过动态卷积模组对候选特征和候选框进行相互制约,可以编码得到最终的对象特征。具体包括:表格重建装置可以对所述第一候选特征进行多头自注意力变换,得到第二候选特征;使用双线性插值算法对所述第一候选框和所述图像特征进行框区域兴趣对齐,得到第一框特征,如此,可以更精确的得到框特征,以提高对小区域的敏感性。之后,再对所述第二候选特征和所述第一框特征进行动态卷积模组增强,得到第二框特征,并基于所述第二框特征确定第二候选框,以及将所述第一候选框更新为所述第二候选框;之后,对所述第二框特征和所述图像特征进行掩码区域兴趣对齐,得到第一像素掩码;对所述第二候选特征和所述第一像素掩码进行动态卷积模组增强,得到目标对象特征。在得到目标对象特征后,表格重建装置可以通过分类解码器获得单元格类别,通过边界框解码器获得单元格坐标,通过像素掩码解码器获得单元格像素掩码。Specifically, the table reconstruction device can use the encoder group to perform feature encoding on the input image features, and then use the decoder group to decode according to different goals. First, the table reconstruction device can randomly initialize the learnable first candidate box and the first candidate feature, wherein the number of the first candidate box and the first candidate feature can be set to 300, the dimension of the first candidate box can be set to 4, and the dimension of the first candidate feature can be set to 256. Then, the candidate features and the candidate boxes are mutually constrained by the dynamic convolution module, and the final object features can be encoded. Specifically including: the table reconstruction device can perform a multi-head self-attention transformation on the first candidate feature to obtain the second candidate feature; use a bilinear interpolation algorithm to align the frame area interest of the first candidate box and the image feature to obtain the first frame feature, so that the frame feature can be obtained more accurately to improve the sensitivity to small areas. Afterwards, the second candidate feature and the first frame feature are enhanced by a dynamic convolution module to obtain a second frame feature, and a second candidate frame is determined based on the second frame feature, and the first candidate frame is updated to the second candidate frame; afterward, the second frame feature and the image feature are aligned with the mask region interest to obtain a first pixel mask; the second candidate feature and the first pixel mask are enhanced by a dynamic convolution module to obtain a target object feature. After obtaining the target object feature, the table reconstruction device can obtain the cell category through the classification decoder, obtain the cell coordinates through the bounding box decoder, and obtain the cell pixel mask through the pixel mask decoder.

可选地,在得到第二框特征后,可以通过一个候选框解码器对第二框特征进行分类和回归以得到第二候选框,并将该第一候选框替换为第二候选框,以实现第一候选框的更新。Optionally, after obtaining the second frame feature, the second frame feature can be classified and regressed by a candidate frame decoder to obtain a second candidate frame, and the first candidate frame can be replaced with the second candidate frame to achieve the update of the first candidate frame.

可选地,上述分类解码器可以包括5个线性层,上述边界框解码器可以包括3个线性层,上述像素掩码解码器可以包括四个连续的卷积层、一个反卷积层、一个1×1卷积层,四个连续的卷积层用于捕捉输入的层次特征,反卷积层用于上采样图像特征的空间分辨率,1×1卷积层用于减少图像特征中的通道数量,得到当前阶段的掩码预测。Optionally, the classification decoder may include 5 linear layers, the bounding box decoder may include 3 linear layers, and the pixel mask decoder may include four consecutive convolutional layers, one deconvolutional layer, and one 1×1 convolutional layer. The four consecutive convolutional layers are used to capture the hierarchical features of the input, the deconvolutional layer is used to upsample the spatial resolution of the image features, and the 1×1 convolutional layer is used to reduce the number of channels in the image features to obtain the mask prediction of the current stage.

可选地,在获取表格图像之前,表格重建装置可以获取训练数据,所述训练数据包括训练图像和图像标签,所述图像标签包括类别标签、坐标标签和像素标签;将所述训练图像输入表格识别模型,得到第一单元格类别、第一单元格坐标以及第一单元格像素掩码;根据所述第一单元格类别和所述类别标签确定第一损失,根据所述第一单元格坐标和所述坐标标签确定第二损失,根据所述第一单元格像素掩码和所述像素标签确定第三损失;基于预定义权重系数对所述第一损失、所述第二损失和所述第三损失进行加权求和,得到目标损失;使用AdamW优化算法按照所述目标损失更新所述表格识别模型的模型参数。Optionally, before acquiring the table image, the table reconstruction device may acquire training data, the training data including training images and image labels, the image labels including category labels, coordinate labels and pixel labels; input the training image into the table recognition model to obtain a first cell category, a first cell coordinate and a first cell pixel mask; determine a first loss based on the first cell category and the category label, determine a second loss based on the first cell coordinate and the coordinate label, and determine a third loss based on the first cell pixel mask and the pixel label; perform weighted summation of the first loss, the second loss and the third loss based on a predefined weight coefficient to obtain a target loss; and use the AdamW optimization algorithm to update the model parameters of the table recognition model according to the target loss.

具体地,如图2所示,将训练图像输入表格识别模型后,表格识别模型可以先对训练图像进行图像特征提取,再通过编码器组进行特征编码,之后,通过解码器组进行特征解码,然后,基于解码结果和图像标签计算目标损失,最后,通过目标损失实现表格识别模型的梯度更新。Specifically, as shown in Figure 2, after the training image is input into the table recognition model, the table recognition model can first extract image features from the training image, then encode the features through the encoder group, and then decode the features through the decoder group. Then, the target loss is calculated based on the decoding result and the image label. Finally, the gradient update of the table recognition model is realized through the target loss.

可选地,如图3所示,表格重建装置可以通过上述表格识别模型提取所述表格图像的图像特征,并根据所述图像特征预测所述表格图像的单元格类别、单元格坐标以及单元格像素掩码。Optionally, as shown in FIG3 , the table reconstruction device may extract image features of the table image through the above-mentioned table recognition model, and predict the cell category, cell coordinates and cell pixel mask of the table image according to the image features.

S103、表格重建装置根据所述单元格坐标和所述单元格像素掩码进行网格线重建得到第一表格,根据所述单元格类别对所述第一表格进行单元格合并得到第二表格。S103, the table reconstruction device reconstructs grid lines according to the cell coordinates and the cell pixel mask to obtain a first table, and merges cells of the first table according to the cell category to obtain a second table.

具体地,继续参考图3,对于输出的单元格坐标集合,表格重建装置可以先通过极大值抑制算法过滤掉置信度低于预设阈值的单元格坐标,然后通过设定的水平线阈值和竖直线阈值,基于过滤后的单元格坐标集合和单元格像素掩码区域进行虚拟网格线重建得到第一表格,根据输出的单元格类别对所述第一表格进行单元格合并,根据合并的单元格集合,生成表格的逻辑结构,并回填单元格内容,得到重建的第二表格。Specifically, continuing to refer to Figure 3, for the output cell coordinate set, the table reconstruction device can first filter out the cell coordinates with a confidence level lower than a preset threshold through a maximum suppression algorithm, and then reconstruct the first table based on the filtered cell coordinate set and the cell pixel mask area through the set horizontal line threshold and vertical line threshold, merge the cells of the first table according to the output cell category, generate the logical structure of the table according to the merged cell set, and backfill the cell content to obtain a reconstructed second table.

本发明实施例在训练过程中可以融合单元格之间的相互依赖关系,可以同时利用结构和内容之间的信息,且不同单元格之间存在共享信息以及共享部分模型参数,可以提高模型的训练效果;在测试和推理过程中,只需要对融合了单元格布局和内容信息的回归框进行推理,所需的模型存储空间复杂度小,在测试阶段,模型直接由包含表格的图像转变为结构化序列,所需的模型解码时间大幅减少,可以有效的从质量和效率上提升表格图像识别架构的识别性能。The embodiment of the present invention can integrate the interdependence between cells during the training process, can simultaneously utilize the information between the structure and the content, and there is shared information and some model parameters between different cells, which can improve the training effect of the model; during the testing and reasoning process, only the regression box that integrates the cell layout and content information needs to be inferred, and the required model storage space complexity is small. In the testing phase, the model is directly converted from an image containing a table to a structured sequence, and the required model decoding time is greatly reduced, which can effectively improve the recognition performance of the table image recognition architecture in terms of quality and efficiency.

本发明实施例中,可以根据图像特征确定表格图像的单元格类别、单元格坐标以及单元格像素掩码,并根据单元格类别、单元格坐标以及单元格像素掩码重建第二表格,由于单元格类别、单元格坐标以及单元格像素掩码可以融合单元格之间的相互依赖关系,因此可以同时利用结构和内容之间的信息实现表格的重建,从而提高识别性能。In an embodiment of the present invention, the cell category, cell coordinates and cell pixel mask of the table image can be determined based on the image features, and the second table can be reconstructed based on the cell category, cell coordinates and cell pixel mask. Since the cell category, cell coordinates and cell pixel mask can integrate the interdependence between cells, the information between the structure and the content can be used simultaneously to realize the reconstruction of the table, thereby improving the recognition performance.

上述主要从方法的角度对本发明实施例提供的方案进行了介绍。为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本发明实施例能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。The above mainly introduces the solution provided by the embodiment of the present invention from the perspective of the method. In order to achieve the above functions, it includes hardware structures and/or software modules corresponding to the execution of each function. Those skilled in the art should easily realize that, in combination with the units and algorithm steps of each example described in the embodiment disclosed in this article, the embodiment of the present invention can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is executed in the form of hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to exceed the scope of the present invention.

本发明实施例提供的表格重建方法,执行主体可以为表格重建装置,或者该表格重建装置中的用于表格重建的控制模块。本发明实施例中以表格重建装置执行表格重建方法为例,说明本发明实施例提供的表格重建装置。The table reconstruction method provided in the embodiment of the present invention can be executed by a table reconstruction device or a control module for table reconstruction in the table reconstruction device. In the embodiment of the present invention, the table reconstruction device provided in the embodiment of the present invention is described by taking the table reconstruction method executed by the table reconstruction device as an example.

需要说明的是,本发明实施例可以根据上述方法示例对表格重建装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。可选的,本发明实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。It should be noted that the embodiment of the present invention can divide the table reconstruction device into functional modules according to the above method example. For example, each functional module can be divided according to each function, or two or more functions can be integrated into one processing module. The above integrated module can be implemented in the form of hardware or in the form of software functional modules. Optionally, the division of modules in the embodiment of the present invention is schematic and is only a logical functional division. There may be other division methods in actual implementation.

如图4所示,本发明实施例提供一种表格重建装置400。该表格重建装置400包括:获取模块401和处理模块402。所述获取模块401,用于获取表格图像;所述处理模块402,用于提取所述表格图像的图像特征,并根据所述图像特征确定所述表格图像的单元格类别、单元格坐标以及单元格像素掩码;根据所述单元格坐标和所述单元格像素掩码进行网格线重建得到第一表格,根据所述单元格类别对所述第一表格进行单元格合并得到第二表格;其中,所述单元格类别包括空白单元格、基础单元格以及合并单元格。As shown in FIG4 , an embodiment of the present invention provides a table reconstruction device 400. The table reconstruction device 400 includes: an acquisition module 401 and a processing module 402. The acquisition module 401 is used to acquire a table image; the processing module 402 is used to extract image features of the table image, and determine the cell category, cell coordinates and cell pixel mask of the table image according to the image features; grid lines are reconstructed according to the cell coordinates and the cell pixel mask to obtain a first table, and cells of the first table are merged according to the cell category to obtain a second table; wherein the cell category includes blank cells, basic cells and merged cells.

可选地,所述处理模块402用于:确定第一候选框和第一候选特征;根据所述图像特征、所述第一候选框和所述第一候选特征预测目标对象特征;通过解码所述目标对象特征得到所述单元格类别、所述单元格坐标以及所述单元格像素掩码。Optionally, the processing module 402 is used to: determine a first candidate box and a first candidate feature; predict target object features based on the image features, the first candidate box and the first candidate feature; and obtain the cell category, the cell coordinates and the cell pixel mask by decoding the target object features.

可选地,所述处理模块402用于:对所述第一候选特征进行多头自注意力变换,得到第二候选特征;对所述第一候选框和所述图像特征进行框区域兴趣对齐,得到第一框特征;对所述第二候选特征和所述第一框特征进行动态卷积模组增强,得到第二框特征,并基于所述第二框特征确定第二候选框,以及将所述第一候选框更新为所述第二候选框;对所述第二框特征和所述图像特征进行掩码区域兴趣对齐,得到第一像素掩码;对所述第二候选特征和所述第一像素掩码进行动态卷积模组增强,得到所述目标对象特征。Optionally, the processing module 402 is used to: perform a multi-head self-attention transformation on the first candidate feature to obtain a second candidate feature; perform frame area interest alignment on the first candidate box and the image feature to obtain a first frame feature; perform dynamic convolution module enhancement on the second candidate feature and the first box feature to obtain a second frame feature, and determine a second candidate box based on the second box feature, and update the first candidate box to the second candidate box; perform mask area interest alignment on the second box feature and the image feature to obtain a first pixel mask; perform dynamic convolution module enhancement on the second candidate feature and the first pixel mask to obtain the target object feature.

可选地,所述获取模块401用于:按照预设尺寸将待处理图像进行尺寸变换处理;识别所述待处理图像中的表格位置,并根据所述表格位置从所述待处理图像中分割所述表格图像。Optionally, the acquisition module 401 is used to: perform size conversion processing on the image to be processed according to a preset size; identify a table position in the image to be processed, and segment the table image from the image to be processed according to the table position.

可选地,所述处理模块402用于:通过卷积神经网络确定所述表格图像的特征表达,通过特征金字塔网络确定所述特征表达的多尺度特征表示,得到所述图像特征。Optionally, the processing module 402 is used to: determine the feature expression of the table image through a convolutional neural network, determine the multi-scale feature representation of the feature expression through a feature pyramid network, and obtain the image feature.

可选地,所述获取模块401用于:获取训练数据,所述训练数据包括训练图像和图像标签,所述图像标签包括类别标签、坐标标签和像素标签;所述处理模块402用于:将所述训练图像输入表格识别模型,得到第一单元格类别、第一单元格坐标以及第一单元格像素掩码;根据所述第一单元格类别和所述类别标签确定第一损失,根据所述第一单元格坐标和所述坐标标签确定第二损失,根据所述第一单元格像素掩码和所述像素标签确定第三损失;基于预定义权重系数对所述第一损失、所述第二损失和所述第三损失进行加权求和,得到目标损失;按照所述目标损失更新所述表格识别模型的模型参数。Optionally, the acquisition module 401 is used to: acquire training data, the training data including training images and image labels, the image labels including category labels, coordinate labels and pixel labels; the processing module 402 is used to: input the training image into a table recognition model to obtain a first cell category, a first cell coordinate and a first cell pixel mask; determine a first loss based on the first cell category and the category label, determine a second loss based on the first cell coordinates and the coordinate label, and determine a third loss based on the first cell pixel mask and the pixel label; perform weighted summation of the first loss, the second loss and the third loss based on a predefined weight coefficient to obtain a target loss; and update the model parameters of the table recognition model according to the target loss.

可选地,所述处理模块402用于:通过所述表格识别模型提取所述表格图像的图像特征,并根据所述图像特征预测所述表格图像的单元格类别、单元格坐标以及单元格像素掩码。Optionally, the processing module 402 is used to: extract image features of the table image through the table recognition model, and predict cell categories, cell coordinates, and cell pixel masks of the table image according to the image features.

本发明实施例中,可以根据图像特征确定表格图像的单元格类别、单元格坐标以及单元格像素掩码,并根据单元格类别、单元格坐标以及单元格像素掩码重建第二表格,由于单元格类别、单元格坐标以及单元格像素掩码可以融合单元格之间的相互依赖关系,因此可以同时利用结构和内容之间的信息实现表格的重建,从而提高识别性能。In an embodiment of the present invention, the cell category, cell coordinates and cell pixel mask of the table image can be determined based on the image features, and the second table can be reconstructed based on the cell category, cell coordinates and cell pixel mask. Since the cell category, cell coordinates and cell pixel mask can integrate the interdependence between cells, the information between the structure and the content can be used simultaneously to realize the reconstruction of the table, thereby improving the recognition performance.

图5示例了一种电子设备的实体结构示意图,如图5所示,该电子设备可以包括:处理器(processor)510、通信接口(Communications Interface)520、存储器(memory)530和通信总线540,其中,处理器510,通信接口520,存储器530通过通信总线540完成相互间的通信。处理器510可以调用存储器530中的逻辑指令,以执行表格重建方法,该方法包括:获取表格图像;提取所述表格图像的图像特征,并根据所述图像特征确定所述表格图像的单元格类别、单元格坐标以及单元格像素掩码;根据所述单元格坐标和所述单元格像素掩码进行网格线重建得到第一表格,根据所述单元格类别对所述第一表格进行单元格合并得到第二表格;其中,所述单元格类别包括空白单元格、基础单元格以及合并单元格。FIG5 illustrates a schematic diagram of the physical structure of an electronic device. As shown in FIG5 , the electronic device may include: a processor 510, a communication interface 520, a memory 530 and a communication bus 540, wherein the processor 510, the communication interface 520 and the memory 530 communicate with each other through the communication bus 540. The processor 510 may call the logic instructions in the memory 530 to execute a table reconstruction method, which includes: obtaining a table image; extracting image features of the table image, and determining the cell category, cell coordinates and cell pixel mask of the table image according to the image features; reconstructing the grid lines according to the cell coordinates and the cell pixel mask to obtain a first table, and merging the cells of the first table according to the cell category to obtain a second table; wherein the cell category includes blank cells, basic cells and merged cells.

此外,上述的存储器530中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the logic instructions in the above-mentioned memory 530 can be implemented in the form of a software functional unit and can be stored in a computer-readable storage medium when it is sold or used as an independent product. Based on such an understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art or the part of the technical solution, can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk and other media that can store program codes.

另一方面,本发明还提供一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,计算机能够执行上述各方法所提供的表格重建方法,该方法包括:获取表格图像;提取所述表格图像的图像特征,并根据所述图像特征确定所述表格图像的单元格类别、单元格坐标以及单元格像素掩码;根据所述单元格坐标和所述单元格像素掩码进行网格线重建得到第一表格,根据所述单元格类别对所述第一表格进行单元格合并得到第二表格;其中,所述单元格类别包括空白单元格、基础单元格以及合并单元格。On the other hand, the present invention also provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, and the computer program includes program instructions. When the program instructions are executed by a computer, the computer can execute the table reconstruction method provided by the above methods, and the method includes: obtaining a table image; extracting image features of the table image, and determining the cell category, cell coordinates and cell pixel mask of the table image based on the image features; reconstructing grid lines according to the cell coordinates and the cell pixel mask to obtain a first table, and merging cells of the first table according to the cell category to obtain a second table; wherein the cell categories include blank cells, basic cells and merged cells.

又一方面,本发明还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各提供的表格重建方法,该方法包括:获取表格图像;提取所述表格图像的图像特征,并根据所述图像特征确定所述表格图像的单元格类别、单元格坐标以及单元格像素掩码;根据所述单元格坐标和所述单元格像素掩码进行网格线重建得到第一表格,根据所述单元格类别对所述第一表格进行单元格合并得到第二表格;其中,所述单元格类别包括空白单元格、基础单元格以及合并单元格。On the other hand, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, is implemented to perform the above-mentioned table reconstruction methods, the methods comprising: obtaining a table image; extracting image features of the table image, and determining the cell category, cell coordinates and cell pixel mask of the table image based on the image features; reconstructing grid lines based on the cell coordinates and the cell pixel mask to obtain a first table, and merging cells of the first table based on the cell category to obtain a second table; wherein the cell categories include blank cells, basic cells and merged cells.

以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. Those of ordinary skill in the art may understand and implement it without creative work.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that each implementation method can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solution is essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, a disk, an optical disk, etc., including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in each embodiment or some parts of the embodiment.

最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit it. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1.一种表格重建方法,其特征在于,包括:1. A table reconstruction method, comprising: 获取表格图像;Get the table image; 提取所述表格图像的图像特征,并根据所述图像特征确定所述表格图像的单元格类别、单元格坐标以及单元格像素掩码;Extracting image features of the table image, and determining cell categories, cell coordinates, and cell pixel masks of the table image according to the image features; 根据所述单元格坐标和所述单元格像素掩码进行网格线重建得到第一表格,根据所述单元格类别对所述第一表格进行单元格合并得到第二表格;Reconstructing grid lines according to the cell coordinates and the cell pixel mask to obtain a first table, and merging cells of the first table according to the cell category to obtain a second table; 其中,所述单元格类别包括空白单元格、基础单元格以及合并单元格。The cell categories include blank cells, basic cells and merged cells. 2.根据权利要求1所述的表格重建方法,其特征在于,所述根据所述图像特征确定所述表格图像的单元格类别、单元格坐标以及单元格像素掩码,包括:2. The table reconstruction method according to claim 1, characterized in that determining the cell category, cell coordinates and cell pixel mask of the table image according to the image features comprises: 确定第一候选框和第一候选特征;Determining a first candidate frame and a first candidate feature; 根据所述图像特征、所述第一候选框和所述第一候选特征预测目标对象特征;Predicting target object features according to the image features, the first candidate box, and the first candidate features; 通过解码所述目标对象特征得到所述单元格类别、所述单元格坐标以及所述单元格像素掩码。The cell category, the cell coordinates and the cell pixel mask are obtained by decoding the target object features. 3.根据权利要求2所述的表格重建方法,其特征在于,所述根据所述图像特征、所述第一候选框和所述第一候选特征确定目标对象特征,包括:3. The table reconstruction method according to claim 2, wherein determining the target object feature according to the image feature, the first candidate frame and the first candidate feature comprises: 对所述第一候选特征进行多头自注意力变换,得到第二候选特征;Performing a multi-head self-attention transformation on the first candidate feature to obtain a second candidate feature; 对所述第一候选框和所述图像特征进行框区域兴趣对齐,得到第一框特征;Performing frame region interest alignment on the first candidate frame and the image feature to obtain a first frame feature; 对所述第二候选特征和所述第一框特征进行动态卷积模组增强,得到第二框特征,并基于所述第二框特征确定第二候选框,以及将所述第一候选框更新为所述第二候选框;Performing dynamic convolution module enhancement on the second candidate feature and the first frame feature to obtain a second frame feature, determining a second candidate frame based on the second frame feature, and updating the first candidate frame to the second candidate frame; 对所述第二框特征和所述图像特征进行掩码区域兴趣对齐,得到第一像素掩码;Performing mask region interest alignment on the second frame feature and the image feature to obtain a first pixel mask; 对所述第二候选特征和所述第一像素掩码进行动态卷积模组增强,得到所述目标对象特征。Dynamic convolution module enhancement is performed on the second candidate feature and the first pixel mask to obtain the target object feature. 4.根据权利要求1所述的表格重建方法,其特征在于,所述获取表格图像,包括:4. The table reconstruction method according to claim 1, wherein obtaining the table image comprises: 按照预设尺寸将待处理图像进行尺寸变换处理;Resize the image to be processed according to a preset size; 识别所述待处理图像中的表格位置,并根据所述表格位置从所述待处理图像中分割所述表格图像。The position of the table in the image to be processed is identified, and the table image is segmented from the image to be processed according to the position of the table. 5.根据权利要求1所述的表格重建方法,其特征在于,所述提取所述表格图像的图像特征,包括:5. The table reconstruction method according to claim 1, wherein extracting image features of the table image comprises: 通过卷积神经网络确定所述表格图像的特征表达,通过特征金字塔网络确定所述特征表达的多尺度特征表示,得到所述图像特征。The feature expression of the table image is determined by a convolutional neural network, and the multi-scale feature representation of the feature expression is determined by a feature pyramid network to obtain the image feature. 6.根据权利要求1-5任一项所述的表格重建方法,其特征在于,所述获取表格图像之前,所述方法还包括:6. The table reconstruction method according to any one of claims 1 to 5, characterized in that before acquiring the table image, the method further comprises: 获取训练数据,所述训练数据包括训练图像和图像标签,所述图像标签包括类别标签、坐标标签和像素标签;Acquire training data, wherein the training data includes training images and image labels, and the image labels include category labels, coordinate labels, and pixel labels; 将所述训练图像输入表格识别模型,得到第一单元格类别、第一单元格坐标以及第一单元格像素掩码;Inputting the training image into a table recognition model to obtain a first cell category, a first cell coordinate, and a first cell pixel mask; 根据所述第一单元格类别和所述类别标签确定第一损失,根据所述第一单元格坐标和所述坐标标签确定第二损失,根据所述第一单元格像素掩码和所述像素标签确定第三损失;Determine a first loss according to the first cell category and the category label, determine a second loss according to the first cell coordinates and the coordinate label, and determine a third loss according to the first cell pixel mask and the pixel label; 基于预定义权重系数对所述第一损失、所述第二损失和所述第三损失进行加权求和,得到目标损失;Performing a weighted summation of the first loss, the second loss, and the third loss based on a predefined weight coefficient to obtain a target loss; 按照所述目标损失更新所述表格识别模型的模型参数。The model parameters of the table recognition model are updated according to the target loss. 7.根据权利要求6所述的表格重建方法,其特征在于,所述提取所述表格图像的图像特征,并根据所述图像特征确定所述表格图像的单元格类别、单元格坐标以及单元格像素掩码,包括:7. The table reconstruction method according to claim 6, characterized in that the step of extracting image features of the table image and determining the cell category, cell coordinates and cell pixel mask of the table image according to the image features comprises: 通过所述表格识别模型提取所述表格图像的图像特征,并根据所述图像特征预测所述表格图像的单元格类别、单元格坐标以及单元格像素掩码。The image features of the table image are extracted by the table recognition model, and the cell category, cell coordinates and cell pixel mask of the table image are predicted according to the image features. 8.一种表格重建装置,其特征在于,包括:获取模块和处理模块;8. A table reconstruction device, characterized by comprising: an acquisition module and a processing module; 所述获取模块,用于获取表格图像;The acquisition module is used to acquire the table image; 所述处理模块,用于提取所述表格图像的图像特征,并根据所述图像特征确定所述表格图像的单元格类别、单元格坐标以及单元格像素掩码;根据所述单元格坐标和所述单元格像素掩码进行网格线重建得到第一表格,根据所述单元格类别对所述第一表格进行单元格合并得到第二表格;The processing module is used to extract image features of the table image, and determine the cell category, cell coordinates and cell pixel mask of the table image according to the image features; reconstruct the grid lines according to the cell coordinates and the cell pixel mask to obtain a first table, and merge the cells of the first table according to the cell category to obtain a second table; 其中,所述单元格类别包括空白单元格、基础单元格以及合并单元格。The cell categories include blank cells, basic cells and merged cells. 9.一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1至7中任一项所述的表格重建方法中的步骤。9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the table reconstruction method as claimed in any one of claims 1 to 7 when executing the program. 10.一种非暂态计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的表格重建方法中的步骤。10. A non-transitory computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the steps in the table reconstruction method according to any one of claims 1 to 7 are implemented.
CN202410102694.3A 2024-01-24 2024-01-24 Table reconstruction method, device, electronic device and storage medium Active CN117973337B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410102694.3A CN117973337B (en) 2024-01-24 2024-01-24 Table reconstruction method, device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410102694.3A CN117973337B (en) 2024-01-24 2024-01-24 Table reconstruction method, device, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN117973337A true CN117973337A (en) 2024-05-03
CN117973337B CN117973337B (en) 2024-10-11

Family

ID=90856737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410102694.3A Active CN117973337B (en) 2024-01-24 2024-01-24 Table reconstruction method, device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN117973337B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190266394A1 (en) * 2018-02-26 2019-08-29 Abc Fintech Co., Ltd. Method and device for parsing table in document image
CN111932545A (en) * 2020-07-14 2020-11-13 浙江大华技术股份有限公司 Image processing method, target counting method and related device thereof
CN113297975A (en) * 2021-05-25 2021-08-24 新东方教育科技集团有限公司 Method and device for identifying table structure, storage medium and electronic equipment
CN114529925A (en) * 2022-04-22 2022-05-24 华南理工大学 Method for identifying table structure of whole line table
CN115331245A (en) * 2022-10-12 2022-11-11 中南民族大学 A table structure recognition method based on image instance segmentation
CN115761773A (en) * 2022-11-17 2023-03-07 上海交通大学 In-image table recognition method and system based on deep learning
WO2023134447A1 (en) * 2022-01-12 2023-07-20 华为技术有限公司 Data processing method and related device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190266394A1 (en) * 2018-02-26 2019-08-29 Abc Fintech Co., Ltd. Method and device for parsing table in document image
CN111932545A (en) * 2020-07-14 2020-11-13 浙江大华技术股份有限公司 Image processing method, target counting method and related device thereof
CN113297975A (en) * 2021-05-25 2021-08-24 新东方教育科技集团有限公司 Method and device for identifying table structure, storage medium and electronic equipment
WO2023134447A1 (en) * 2022-01-12 2023-07-20 华为技术有限公司 Data processing method and related device
CN114529925A (en) * 2022-04-22 2022-05-24 华南理工大学 Method for identifying table structure of whole line table
CN115331245A (en) * 2022-10-12 2022-11-11 中南民族大学 A table structure recognition method based on image instance segmentation
CN115761773A (en) * 2022-11-17 2023-03-07 上海交通大学 In-image table recognition method and system based on deep learning

Also Published As

Publication number Publication date
CN117973337B (en) 2024-10-11

Similar Documents

Publication Publication Date Title
CN110689036B (en) Method and system for automatic chromosome classification
CN111914654B (en) Text layout analysis method, device, equipment and medium
CN112700460B (en) Image segmentation method and system
CN112906794A (en) Target detection method, device, storage medium and terminal
CN113239818B (en) Table cross-modal information extraction method based on segmentation and graph convolution neural network
CN116433914A (en) Two-dimensional medical image segmentation method and system
CN110879972B (en) Face detection method and device
CN113537187A (en) Text recognition method and device, electronic equipment and readable storage medium
CN116758093B (en) Image segmentation method, model training method, device, equipment and medium
CN114882047A (en) Medical image segmentation method and system based on semi-supervision and Transformers
CN110807463A (en) Image segmentation method and device, computer equipment and storage medium
CN117541668A (en) Virtual character generation method, device, equipment and storage medium
CN116311423A (en) A multi-modal emotion recognition method based on cross-attention mechanism
CN113554655B (en) Optical remote sensing image segmentation method and device based on multi-feature enhancement
CN117973337B (en) Table reconstruction method, device, electronic device and storage medium
CN117474796B (en) Image generation method, device, equipment and computer readable storage medium
CN116664602B (en) OCTA blood vessel segmentation method and imaging method based on few sample learning
CN113744158B (en) Image generation method, device, electronic equipment and storage medium
CN112801909B (en) Image fusion denoising method and system based on U-Net and pyramid module
CN112132031B (en) Vehicle style identification method and device, electronic equipment and storage medium
CN113487622B (en) Head and neck organ image segmentation method, device, electronic equipment and storage medium
CN117440104B (en) Data compression reconstruction method based on target significance characteristics
CN115115537B (en) An image restoration method based on mask training
CN117745677B (en) Insulator string defect detection method, system, terminal and storage medium
CN112581359B (en) Image processing method, device, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant