CN116229490A

CN116229490A - Layout analysis method, device, equipment and medium of graphic neural network

Info

Publication number: CN116229490A
Application number: CN202310246347.3A
Authority: CN
Inventors: 魏舒; 陈运文; 纪达麒; 李巍豪; 高翔
Original assignee: Datagrand Information Technology Shanghai Co ltd
Current assignee: Datagrand Information Technology Shanghai Co ltd
Priority date: 2023-03-14
Filing date: 2023-03-14
Publication date: 2023-06-06

Abstract

The invention discloses a layout analysis method, device, equipment and medium of a graph neural network. A model training method comprising: inputting the text image sample to a text detection module to obtain a text detection sample box; creating a first graph to be analyzed according to the text detection sample box, and determining node characteristics and edge characteristics of the first graph to be analyzed; inputting the node characteristics and the edge characteristics into an original graph neural network model to perform classification training of node types and edge types, so as to obtain a target graph neural network model; the target graph neural network model is used for carrying out layout analysis on texts in the text images. The technical scheme of the embodiment of the invention can improve the layout analysis performance and reduce the application limitation.

Description

Layout analysis method, device, equipment and medium of a graph neural network

技术领域technical field

本发明涉及版面分析技术领域，尤其涉及一种图神经网络的版面分析方法、装置、设备及介质。The invention relates to the technical field of layout analysis, in particular to a layout analysis method, device, equipment and medium of a graph neural network.

背景技术Background technique

版面分析任务在实际生活中应用广泛，例如在图像格式文档转换为文本格式时，需要准确的版面分析结果，再基于版面分析结果做后续如文档比对任务，信息抽取任务等工作。Layout analysis tasks are widely used in real life. For example, when an image format document is converted to a text format, accurate layout analysis results are required, and then follow-up tasks such as document comparison tasks and information extraction tasks are performed based on the layout analysis results.

目前，常用的版面分析方案一种是目标检测方式，另一种是使用TransformerEncoder系列的方式。基于目标检测方式进行版面分析时，得到的检测框不是非常准确，并且检测框的重叠问题也难以解决，还需要后处理规则来解决和文字的匹配关系，导致版面分析效果较差。而Transformer Encoder系列的方式对文字数量有最大长度限制，现实情况中一页文档Token数量超过512甚至1024的情况比较常见，但是Transformer Encoder系列模型大，运算资源高，需要适配特定的tokenizer，与语言关联度大，不同语言间不能直接迁移，而且此领域公开的数据集大量都是英文的，比如DocBank、PubLayNet都是数据量数十万级的数据集，中文的或者其他语种的数量很少，即使有但也存在标注质量不佳的情况。At present, one of the commonly used layout analysis schemes is the target detection method, and the other is the method of using the TransformerEncoder series. When performing layout analysis based on the target detection method, the obtained detection frame is not very accurate, and the overlapping problem of the detection frame is also difficult to solve, and post-processing rules are needed to solve the matching relationship with the text, resulting in poor layout analysis effect. However, the Transformer Encoder series has a maximum length limit on the number of characters. In reality, it is common for the number of Tokens on a page to exceed 512 or even 1024. However, the Transformer Encoder series has a large model and high computing resources, and needs to be adapted to a specific tokenizer. The language is highly correlated, and different languages cannot be transferred directly. Moreover, most of the public datasets in this field are in English. For example, DocBank and PubLayNet are datasets with hundreds of thousands of data, and the number of Chinese or other languages is very small. , even if there is, there are cases where the labeling quality is not good.

发明内容Contents of the invention

本发明提供了一种图神经网络的版面分析方法、装置、设备及介质，以解决现有版面分析方法分析效果较差以及应用局限大的问题。The invention provides a graph neural network layout analysis method, device, equipment and medium to solve the problems of poor analysis effect and large application limitation of the existing layout analysis method.

根据本发明的一方面，提供了一种模型训练方法，包括：According to an aspect of the present invention, a kind of model training method is provided, comprising:

将文本图像样本输入至文本检测模块，得到文本检测样本框；The text image sample is input to the text detection module to obtain a text detection sample box;

根据文本检测样本框创建第一待分析图，并确定第一待分析图的节点特征以及边特征；其中，第一待分析图中的节点对应文本检测样本框；Create the first graph to be analyzed according to the text detection sample frame, and determine the node features and edge features of the first graph to be analyzed; wherein, the nodes in the first graph to be analyzed correspond to the text detection sample frame;

将节点特征以及边特征，输入至原始图神经网络模型中进行节点类型以及边类型的分类训练，得到目标图神经网络模型；Input the node features and edge features into the original graph neural network model for classification training of node types and edge types to obtain the target graph neural network model;

目标图神经网络模型，用于对文本图像中的文本进行版面分析。Object graph neural network model for layout analysis of text in text images.

根据本发明的另一方面，提供了一种版面分析方法，包括：获取待分析文本图像，将待分析文本图像输入至文本检测模块，得到文本检测框；According to another aspect of the present invention, a layout analysis method is provided, including: acquiring a text image to be analyzed, inputting the text image to be analyzed into a text detection module, and obtaining a text detection frame;

根据文本检测框创建第二待分析图，并确定第二待分析图的目标节点特征以及目标边特征；其中，第二待分析图中的目标节点对应文本检测框；Create the second graph to be analyzed according to the text detection frame, and determine the target node feature and the target edge feature of the second graph to be analyzed; wherein, the target node in the second graph to be analyzed corresponds to the text detection frame;

将目标节点特征以及目标边特征，输入至目标图神经网络模型，得到目标节点类型以及目标边类型；Input the target node features and target edge features into the target graph neural network model to obtain the target node type and target edge type;

其中，目标图神经网络模型为任意实施例中的模型训练方法训练得到的模型。Wherein, the target graph neural network model is a model trained by the model training method in any embodiment.

根据本发明的另一方面，提供了一种模型训练装置，包括：According to another aspect of the present invention, a model training device is provided, comprising:

文本检测样本框获取模块，用于将文本图像样本输入至文本检测模块，得到文本检测样本框；The text detection sample frame acquisition module is used to input the text image sample to the text detection module to obtain the text detection sample frame;

第一特征确定模块，用于根据文本检测样本框创建第一待分析图，并确定第一待分析图的节点特征以及边特征；其中，第一待分析图中的节点对应文本检测样本框；The first feature determination module is used to create the first graph to be analyzed according to the text detection sample frame, and determine the node features and edge features of the first graph to be analyzed; wherein, the nodes in the first graph to be analyzed correspond to the text detection sample frame;

目标图神经网络模型确定模块，用于将节点特征以及边特征，输入至原始图神经网络模型中进行节点类型以及边类型的分类训练，得到目标图神经网络模型；The target graph neural network model determination module is used to input node features and edge features into the original graph neural network model to perform classification training of node types and edge types to obtain the target graph neural network model;

根据本发明的另一方面，提供了一种版面分析装置，包括：According to another aspect of the present invention, a layout analysis device is provided, comprising:

文本检测框获取模块，用于获取待分析文本图像，将待分析文本图像输入至文本检测模块，得到文本检测框；The text detection frame acquisition module is used to obtain the text image to be analyzed, and input the text image to be analyzed to the text detection module to obtain the text detection frame;

第二特征确定模块，用于根据文本检测框创建第二待分析图，并确定第二待分析图的目标节点特征以及目标边特征；其中，第二待分析图中的目标节点对应文本检测框；The second feature determination module is used to create a second graph to be analyzed according to the text detection frame, and determine the target node features and target edge features of the second graph to be analyzed; wherein, the target node in the second graph to be analyzed corresponds to the text detection frame ;

分类模块，用于将目标节点特征以及目标边特征，输入至目标图神经网络模型，得到目标节点类型以及目标边类型；The classification module is used for inputting target node features and target edge features into the target graph neural network model to obtain target node types and target edge types;

根据本发明的另一方面，提供了一种电子设备，所述电子设备包括：According to another aspect of the present invention, an electronic device is provided, and the electronic device includes:

至少一个处理器；以及at least one processor; and

与所述至少一个处理器通信连接的存储器；其中，a memory communicatively coupled to the at least one processor; wherein,

所述存储器存储有可被所述至少一个处理器执行的计算机程序，所述计算机程序被所述至少一个处理器执行，以使所述至少一个处理器能够执行本发明任一实施例所述的模型训练方法，或者版面分析方法。The memory stores a computer program that can be executed by the at least one processor, and the computer program is executed by the at least one processor, so that the at least one processor can execute the method described in any embodiment of the present invention. Model training methods, or layout analysis methods.

根据本发明的另一方面，提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机指令，所述计算机指令用于使处理器执行时实现本发明任一实施例所述的模型训练方法，或者版面分析方法。According to another aspect of the present invention, a computer-readable storage medium is provided, the computer-readable storage medium stores computer instructions, and the computer instructions are used to enable a processor to implement any of the embodiments of the present invention when executed. model training methods, or layout analysis methods.

本发明实施例的技术方案，通过将文本图像样本输入至文本检测模块，得到文本检测样本框，从而根据文本检测样本框创建第一待分析图，并确定第一待分析图的节点特征以及边特征，进而将节点特征以及边特征，输入至原始图神经网络模型中进行节点类型以及边类型的分类训练，得到目标图神经网络模型，实现目标图神经网络模型对文本图像中的文本进行版面分析。由于节点类型以及边类型可以反映文本检测样本框在文本图像样本的版面布局，通过提取的节点特征以及边特征对原始图神经网络模型中进行节点类型以及边类型的分类训练，可以使最终得到的目标图神经网络模型具备更佳的版面分析效果，解决了现有版面分析方法分析效果较差以及应用局限大的问题，能够提升版面分析性能，并降低应用局限性。In the technical solution of the embodiment of the present invention, the text detection sample frame is obtained by inputting the text image sample into the text detection module, thereby creating the first graph to be analyzed according to the text detection sample frame, and determining the node characteristics and edges of the first graph to be analyzed features, and then input the node features and edge features into the original graph neural network model for classification training of node types and edge types, and obtain the target graph neural network model, and realize the layout analysis of the text in the text image by the target graph neural network model . Since the node type and edge type can reflect the layout of the text detection sample frame in the text image sample, the classification training of the node type and edge type in the original graph neural network model through the extracted node features and edge features can make the final obtained The target graph neural network model has a better layout analysis effect, which solves the problems of poor analysis effect and large application limitations of existing layout analysis methods, can improve layout analysis performance, and reduce application limitations.

应当理解，本部分所描述的内容并非旨在标识本发明的实施例的关键或重要特征，也不用于限制本发明的范围。本发明的其它特征将通过以下的说明书而变得容易理解。It should be understood that the content described in this section is not intended to identify key or important features of the embodiments of the present invention, nor is it intended to limit the scope of the present invention. Other features of the present invention will be easily understood from the following description.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained based on these drawings without creative effort.

图1为本发明实施例一提供的一种模型训练方法的流程图；FIG. 1 is a flowchart of a model training method provided by Embodiment 1 of the present invention;

图2为本发明实施例二提供的一种模型训练方法的流程图；FIG. 2 is a flow chart of a model training method provided in Embodiment 2 of the present invention;

图3为本发明实施例三提供的一种版面分析方法的流程图；Fig. 3 is a flowchart of a layout analysis method provided by Embodiment 3 of the present invention;

图4为本发明实施例三提供的一种版面分析方法的算法逻辑图；Fig. 4 is an algorithmic logic diagram of a layout analysis method provided by Embodiment 3 of the present invention;

图5为本发明实施例四提供的一种模型训练装置的结构示意图；FIG. 5 is a schematic structural diagram of a model training device provided in Embodiment 4 of the present invention;

图6为本发明实施例五提供的一种版面分析装置的结构示意图；FIG. 6 is a schematic structural diagram of a layout analysis device provided in Embodiment 5 of the present invention;

图7示出了可以用来实施本发明的实施例的电子设备的结构示意图。FIG. 7 shows a schematic structural diagram of an electronic device that can be used to implement an embodiment of the present invention.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本发明方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分的实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明保护的范围。In order to enable those skilled in the art to better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only It is an embodiment of a part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

需要说明的是，本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first" and "second" in the description and claims of the present invention and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the invention described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.

实施例一Embodiment one

图1为本发明实施例一提供的一种模型训练方法的流程图，本实施例可适用于对文本图像进行高效精准的版面分析的情况，该方法可以由模型训练装置来执行，该模型训练装置可以采用硬件和/或软件的形式实现，该模型训练装置可配置于电子设备中。如图1所示，该方法包括：Fig. 1 is a flow chart of a model training method provided by Embodiment 1 of the present invention. This embodiment is applicable to the case of performing efficient and accurate layout analysis on text images. This method can be executed by a model training device. The model training The device can be implemented in the form of hardware and/or software, and the model training device can be configured in electronic equipment. As shown in Figure 1, the method includes:

S110、将文本图像样本输入至文本检测模块，得到文本检测样本框。S110. Input the text image sample into the text detection module to obtain a text detection sample box.

其中，文本图像样本可以是用于版面分析训练的图像样本。示例性的，文本图像样本可以包括但不限于票据、报纸以及杂志等。文本检测模块可以是任意的具备文本检测功能的模型。文本检测模块可以用于检测图像中的文字区域。文本检测样本框可以是文本检测模块识别出的文本图像样本中的文字区域边框。Wherein, the text image sample may be an image sample used for layout analysis training. Exemplarily, the text image samples may include but not limited to bills, newspapers, and magazines. The text detection module can be any model with text detection function. The text detection module can be used to detect text regions in images. The text detection sample frame may be a frame of a text area in a text image sample recognized by the text detection module.

在本发明实施例中，可以根据版面分析需求获取文本图像样本，进而将文本图像样本输入至文本检测模块，从而根据文本检测模块对文本图像样本中的文字区域进行识别，得到文本检测样本框。In the embodiment of the present invention, the text image sample can be obtained according to the layout analysis requirements, and then the text image sample can be input into the text detection module, so that the text region in the text image sample can be identified according to the text detection module, and the text detection sample box can be obtained.

S120、根据文本检测样本框创建第一待分析图，并确定第一待分析图的节点特征以及边特征。S120. Create a first graph to be analyzed according to the text detection sample frame, and determine node features and edge features of the first graph to be analyzed.

其中，第一待分析图可以是根据各文本检测样本框之间的相关性绘制的图。第一待分析图中的节点对应文本检测样本框，第一待分析图中连接节点的边对应所连接节点的位置关系以及上下文关系等。节点特征可以是构成第一待分析图中节点所具有的特征。边特征可以是构成第一待分析图中节点之间的边所具有的特征。Wherein, the first graph to be analyzed may be a graph drawn according to the correlation between each text detection sample frame. The nodes in the first graph to be analyzed correspond to the text detection sample frame, and the edges connecting the nodes in the first graph to be analyzed correspond to the positional relationship and contextual relationship of the connected nodes. The node features may be the features of the nodes constituting the first graph to be analyzed. The edge features may be the features of the edges that constitute the nodes in the first graph to be analyzed.

在本发明实施例中，可以根据文本检测样本框在文本图像样本的位置以及文本检测样本框的相邻关系，确定文本检测样本框之间的相关性，从而根据文本检测样本框之间的相关性绘制第一待分析图，进而确定第一待分析图的节点特征以及第一待分析图中节点之间边的边特征，进一步获取与第一待分析图的节点特征对应的标签数据以及与第一待分析图的边特征对应的标签数据。In the embodiment of the present invention, the correlation between the text detection sample frames can be determined according to the position of the text detection sample frame in the text image sample and the adjacent relationship between the text detection sample frames, so that the correlation between the text detection sample frames Draw the first graph to be analyzed systematically, and then determine the node features of the first graph to be analyzed and the edge features of the edges between nodes in the first graph to be analyzed, and further obtain the label data corresponding to the node features of the first graph to be analyzed and the Label data corresponding to edge features of the first graph to be analyzed.

S130、将节点特征以及边特征，输入至原始图神经网络模型中进行节点类型以及边类型的分类训练，得到目标图神经网络模型。S130. Input node features and edge features into the original graph neural network model to perform classification training of node types and edge types to obtain a target graph neural network model.

其中，原始图神经网络模型可以是任意一种已知的图神经网络模型。目标图神经网络模型，可以用于对文本图像中的文本进行版面分析。节点类型可以用于表征节点对应文本检测样本框所属的类别。文本检测样本框所属的类别可以理解为构成版面的元素类型，如标题、段落、页眉或页脚等。边类型可以用于表示第一待分析图中边连接节点的同属关系，能够辅助确定节点对应的文本检测样本框是否属于同一段落。Wherein, the original graph neural network model may be any known graph neural network model. An object graph neural network model that can be used for layout analysis of text in text images. The node type can be used to represent the category to which the text detection sample box corresponding to the node belongs. The category to which the text detection sample box belongs can be understood as the element type that constitutes the layout, such as title, paragraph, header or footer, etc. The edge type can be used to indicate the belonging relationship of the edge-connected nodes in the first graph to be analyzed, and can assist in determining whether the text detection sample frames corresponding to the nodes belong to the same paragraph.

在本发明实施例中，可以将节点特征、边特征、与节点特征对应的标签数据以及与边特征对应的标签数据，输入至原始图神经网络模型中，从而基于节点特征以及与节点特征对应的标签数据，对原始图神经网络模型进行节点类型的分类训练，得到更新后的节点特征，进而根据更新后的节点特征、输入的边特征以及与边特征对应的标签数据，对原始图神经网络模型进行边类型的分类训练，得到目标图神经网络模型，从而可以通过目标图神经网络模型对文本图像中的文本进行版面分析。In the embodiment of the present invention, node features, edge features, label data corresponding to node features, and label data corresponding to edge features can be input into the original graph neural network model, so that based on node features and node features corresponding Label data, classify and train the node types of the original graph neural network model, obtain updated node features, and then use the updated node features, input edge features, and label data corresponding to edge features to train the original graph neural network model The edge type classification training is carried out to obtain the target graph neural network model, so that the layout analysis of the text in the text image can be performed through the target graph neural network model.

实施例二Embodiment two

图2为本发明实施例二提供的一种模型训练方法的流程图，本实施例以上述实施例为基础进行具体化，给出了将节点特征以及边特征，输入至原始图神经网络模型中进行节点类型以及边类型的分类训的具体的可选的实施方式。如图2所示，该方法包括：Fig. 2 is a flow chart of a model training method provided by Embodiment 2 of the present invention. This embodiment is embodied on the basis of the above-mentioned embodiment, and it provides that node features and edge features are input into the original graph neural network model. A specific optional implementation manner for performing classification training of node types and edge types. As shown in Figure 2, the method includes:

S210、将文本图像样本输入至文本检测模块，得到文本检测样本框。S210. Input the text image sample into the text detection module to obtain a text detection sample frame.

S220、根据文本检测样本框创建第一待分析图，并确定第一待分析图的节点特征以及边特征。S220. Create a first graph to be analyzed according to the text detection sample frame, and determine node features and edge features of the first graph to be analyzed.

在本发明的一个可选实施例中，根据文本检测样本框创建第一待分析图，可以包括：获取预设构图规则；根据预设构图规则以及文本检测样本框，创建第一待分析图；预设构图规则可以包括全联接规则、第一相邻文本框建边规则或者第二相邻文本框建边规则。In an optional embodiment of the present invention, creating the first graph to be analyzed according to the text detection sample frame may include: obtaining a preset composition rule; creating the first graph to be analyzed according to the preset composition rule and the text detection sample frame; The preset composition rule may include a full connection rule, a first adjacent text frame border building rule, or a second adjacent text frame border building rule.

其中，预设构图规则可以是预先制定的连通图的构图规则。全联接规则可以是将连通图中任意两节点间创建连接边的规则。第一相邻文本框建边规则可以是使用Beta-skeleton，为相邻节点创建连接边的规则。第二相邻文本框建边规则可以是使用KNN(K-NearestNeighbor，最邻近节点算法)，将距离最近的K个节点之间创建连接边的规则。Wherein, the preset composition rule may be a pre-established composition rule of the connected graph. The full join rule can be a rule that creates a connection edge between any two nodes in the connected graph. The first adjacent text box edge building rule may be a rule for creating connecting edges for adjacent nodes using Beta-skeleton. The second adjacent text box edge building rule may be a rule for creating connection edges between K nodes with the closest distance using KNN (K-Nearest Neighbor, nearest neighbor node algorithm).

在本发明实施例中，可以根据第一待分析图的建图需要，从现有的连通图节点构建边的规则(如全联接规则、第一相邻文本框建边规则或者第二相邻文本框建边规则等)中确定预设构图规则，进而设置与文本检测样本框对应的节点，并通过预设构图规则将与文本检测样本框对应的节点进行连接，得到第一待分析图。In the embodiment of the present invention, according to the mapping needs of the first graph to be analyzed, the rules of building edges from existing connected graph nodes (such as full connection rules, first adjacent text box edge building rules or second adjacent Text frame border building rules, etc.) to determine the preset composition rules, and then set the nodes corresponding to the text detection sample frame, and connect the nodes corresponding to the text detection sample frame through the preset composition rules to obtain the first graph to be analyzed.

S230、将节点特征输入至原始图神经网络模型中，确定第一待分析图中各节点的节点更新特征，并根据各节点的节点更新特征对原始图神经网络模型进行节点类型训练。S230. Input node features into the original graph neural network model, determine node update features of each node in the first graph to be analyzed, and perform node type training on the original graph neural network model according to the node update features of each node.

其中，节点更新特征可以是原始图神经网络模型根据第一待分析图中的交集节点确定的节点的新特征。示例性的，假设节点1与节点2之间存在边，节点1与节点3之间存在边，则节点1为节点2和节点3的交集节点。Wherein, the node update feature may be a new feature of the node determined by the neural network model of the original graph according to the intersection nodes in the first graph to be analyzed. Exemplarily, assuming that there is an edge between node 1 and node 2, and an edge between node 1 and node 3, then node 1 is an intersection node of node 2 and node 3.

在本发明实施例中，将节点特征输入至原始图神经网络模型之后，可以利用原始图神经网络模型，分别根据与各交集节点关联节点的节点特征，对第一待分析图的节点特征进行更新，得到第一待分析图中各节点的节点更新特征，从而根据各节点的节点更新特征以及与节点特征对应的标签数据，对原始图神经网络模型进行节点类型的识别训练。In the embodiment of the present invention, after the node features are input into the original graph neural network model, the original graph neural network model can be used to update the node features of the first graph to be analyzed according to the node features of the nodes associated with each intersection node , to obtain the node update features of each node in the first graph to be analyzed, so as to perform node type identification training on the original graph neural network model according to the node update features of each node and the label data corresponding to the node features.

在本发明的一个可选实施例中，确定第一待分析图的节点特征以及边特征，可以包括：将各节点对应文本检测样本框的文本框坐标以及文本框图像特征，作为第一待分析图的节点特征；将第一待分析图中各边连接节点的节点特征、各边匹配的文本框坐标之间的相对距离、各边匹配的文本框长宽比以及边对应节点的预测类型，作为第一待分析图的边特征。In an optional embodiment of the present invention, determining the node features and edge features of the first graph to be analyzed may include: taking the text frame coordinates and text frame image features of each node corresponding to the text detection sample frame as the first graph to be analyzed The node feature of graph; the node feature of each edge connection node in the first graph to be analyzed, the relative distance between the text box coordinates of each edge match, the text box aspect ratio of each edge match and the prediction type of edge corresponding node, As the edge feature of the first graph to be analyzed.

其中，文本框坐标可以是文本检测样本框在文本图像样本的坐标。文本框坐标可以包括但不限于文本检测样本框左上角以及右下角的坐标。文本框图像特征可以是文本检测样本框在文本图像样本对应的局部图像。文本框长宽比可以是文本检测样本框长边与宽边的边长比值。Wherein, the coordinates of the text frame may be the coordinates of the text detection sample frame in the text image sample. The coordinates of the text box may include but not limited to the coordinates of the upper left corner and the lower right corner of the text detection sample box. The text frame image feature may be a partial image of the text detection sample frame corresponding to the text image sample. The aspect ratio of the text box may be a side length ratio of the long side and the wide side of the text detection sample box.

在本发明实施例中，可以根据第一待分析图中各节点对应的文本检测样本框在文本图像样本的位置，获取各节点对应文本检测样本框的文本框坐标以及文本框图像特征，进而将各节点对应文本检测样本框的文本框坐标以及文本框图像特征，作为第一待分析图的节点特征，进而确定第一待分析图中各边连接节点的节点特征、各边匹配的文本框坐标之间的相对距离(即各边连接节点对应文本检测样本框的文本框坐标之间的相对距离)以及各边匹配的文本框长宽比(即各边连接节点对应文本检测样本框的长边与宽边的边长比值)，进而分别对各边连接的节点的节点类型进行预测，得到各边连接节点的预测类型，从而将第一待分析图中各边连接节点的节点特征、各边匹配的文本框坐标之间的相对距离、各边匹配的文本框长宽比以及各边对应节点的预测类型，作为第一待分析图的边特征。In the embodiment of the present invention, according to the position of the text detection sample frame corresponding to each node in the first graph to be analyzed in the text image sample, the text frame coordinates and text frame image features of the text detection sample frame corresponding to each node can be obtained, and then the Each node corresponds to the text box coordinates and text box image features of the text detection sample box, as the node features of the first graph to be analyzed, and then determine the node features of the connected nodes of each edge in the first graph to be analyzed, and the text frame coordinates matched by each edge The relative distance between (that is, the relative distance between the text box coordinates of the text detection sample box corresponding to the connection nodes of each side) and the aspect ratio of the text box matched by each side (that is, the long side of the text detection sample box corresponding to the connection node of each side ratio of the side length to the wide side), and then predict the node types of the nodes connected by each side respectively, and obtain the predicted type of the nodes connected by each side, so that the node characteristics of the nodes connected by each side in the first graph to be analyzed, and each side The relative distance between the coordinates of the matched text boxes, the aspect ratio of the matched text boxes of each edge, and the predicted type of the corresponding node of each edge are used as edge features of the first graph to be analyzed.

S240、根据各节点的节点更新特征以及边特征，对原始图神经网络模型进行边类型的分类训练，得到目标图神经网络模型。S240. Perform edge type classification training on the original graph neural network model according to the node update features and edge features of each node to obtain a target graph neural network model.

在本发明实施例中，可以根据各节点的节点更新特征、边特征以及与边特征对应的标签数据，对原始图神经网络模型进行边类型的二分类的分类训练，得到目标图神经网络模型，以通过目标图神经网络模型对文本图像中的文本进行版面分析。In the embodiment of the present invention, according to the node update feature of each node, the edge feature and the label data corresponding to the edge feature, the original graph neural network model can be trained for the classification training of the two classifications of edge types to obtain the target graph neural network model. To perform layout analysis of text in text images by target graph neural network model.

在一个具体的例子中，可以将文档图像(文本图像样本)输入至文本检测模块，得到文本检测样本框，从而将文本检测样本框作为节点，按照预设构图规则(如全联接规则、第一相邻文本框建边规则或者第二相邻文本框建边规则中的任意一个)构建第一待分析图。In a specific example, the document image (text image sample) can be input to the text detection module to obtain the text detection sample box, so that the text detection sample box is used as a node, and according to the preset composition rules (such as the full connection rule, the first Adjacent text frame border building rules or any one of the second adjacent text frame border building rules) to construct the first graph to be analyzed.

在得到第一待分析图之后，可以构建第一待分析图的节点特征以及边特征，具体的，可以确定第一待分析图中各节点对应文本检测样本框的文本框坐标，若文本检测样本框不是矩形，则可以对文本框坐标的坐标点数进行扩展。将文档图像输入至卷积神经网络后，得到文档图像的图像特征，进而根据文档图像的图像特征以及文本框坐标，得到与节点对应的文本检测样本框的局部图像特征。由于每条边连接两个节点，表示两个文本检测样本框是否有关系，即是否同属于一个实例，因此边特征可以有助于进行版面分析。进一步，将节点特征以及边特征，输入至原始图神经网络模型中进行节点类型以及边类型的分类训练，得到目标图神经网络模型。After obtaining the first graph to be analyzed, the node features and edge features of the first graph to be analyzed can be constructed. Specifically, the text box coordinates of the text detection sample boxes corresponding to each node in the first graph to be analyzed can be determined. If the text detection sample If the frame is not a rectangle, the number of coordinate points of the text frame coordinates can be extended. After the document image is input to the convolutional neural network, the image features of the document image are obtained, and then according to the image features of the document image and the coordinates of the text box, the local image features of the text detection sample box corresponding to the node are obtained. Since each edge connects two nodes, it indicates whether two text detection sample boxes are related, that is, whether they belong to the same instance, so edge features can be helpful for layout analysis. Further, the node features and edge features are input into the original graph neural network model for node type and edge type classification training to obtain the target graph neural network model.

其中，原始图神经网络模型可以包括但不限于GCN、GAT、GAT-v2、DGCNN以及GravNet等。原始图神经网络模型可以堆叠多层，具备两个分类头，原始图神经网络模型的一个分类头进行节点类型的分类训练，另一个分类头可以用于边类型的分类训练。基于前述部分构建最终的原始图神经网络模型，在不同数据集上，调整每个部分的选择，以达到最好效果的配置。Among them, the original graph neural network model may include, but not limited to, GCN, GAT, GAT-v2, DGCNN, and GravNet. The original graph neural network model can be stacked in multiple layers and has two classification heads. One classification head of the original graph neural network model is used for classification training of node types, and the other classification head can be used for classification training of edge types. Construct the final original graph neural network model based on the previous parts, and adjust the selection of each part on different data sets to achieve the best configuration.

本发明实施例的技术方案，通过将文本图像样本输入至文本检测模块，得到文本检测样本框，从而根据文本检测样本框创建第一待分析图，并确定第一待分析图的节点特征以及边特征，并将节点特征输入至原始图神经网络模型中，确定第一待分析图中各节点的节点更新特征，并根据各节点的节点更新特征对原始图神经网络模型进行节点类型训练，进而根据各节点的节点更新特征以及边类型，对原始图神经网络模型进行边类型的分类训练，得到目标图神经网络模型，实现目标图神经网络模型对文本图像中的文本进行版面分析。由于节点类型以及边类型可以反映文本检测样本框在文本图像样本的版面布局，通过提取的节点特征以及边特征对原始图神经网络模型中进行节点类型以及边类型的分类训练，可以使最终得到的目标图神经网络模型具备更佳的版面分析效果，解决了现有版面分析方法分析效果较差以及应用局限大的问题，能够提升版面分析性能，并降低应用局限性。In the technical solution of the embodiment of the present invention, the text detection sample frame is obtained by inputting the text image sample into the text detection module, thereby creating the first graph to be analyzed according to the text detection sample frame, and determining the node characteristics and edges of the first graph to be analyzed feature, and input the node features into the original graph neural network model, determine the node update features of each node in the first graph to be analyzed, and perform node type training on the original graph neural network model according to the node update features of each node, and then according to The node update characteristics and edge types of each node, the edge type classification training is performed on the original graph neural network model, and the target graph neural network model is obtained, and the target graph neural network model is implemented to perform layout analysis on the text in the text image. Since the node type and edge type can reflect the layout of the text detection sample frame in the text image sample, the classification training of the node type and edge type in the original graph neural network model through the extracted node features and edge features can make the final obtained The target graph neural network model has a better layout analysis effect, which solves the problems of poor analysis effect and large application limitations of existing layout analysis methods, can improve layout analysis performance, and reduce application limitations.

本方案使用原始图神经网络模型作为算法核心，计算量小，合理设置第一待分析图中的节点特征和边特征，可以达到很好的算法效果，并且本方案没有使用文本信息，与语言无关，可以直接使用公开的大量英文数据集，而不用受限于特定的语种，没有单页文字数量的限制，无需匹配文本框以及对重叠的目标检测框进行复杂的后续处理，相较于现有方案，本方案在整体性能上(参数量、显存占用、推理时间以及准确率)有非常大的优势，适合工业化落地。This scheme uses the original graph neural network model as the core of the algorithm, and the amount of calculation is small. Reasonably setting the node features and edge features in the first graph to be analyzed can achieve a good algorithm effect, and this scheme does not use text information, which has nothing to do with language , you can directly use a large number of public English data sets without being limited to a specific language, there is no limit on the number of texts on a single page, and there is no need to match text boxes and perform complex post-processing on overlapping target detection boxes. Compared with the existing solution, this solution has great advantages in terms of overall performance (parameters, video memory usage, inference time, and accuracy), and is suitable for industrialization.

实施例三Embodiment three

图3为本发明实施例三提供的一种版面分析方法的流程图，该方法可以由版面分析装置来执行，该版面分析装置可以采用硬件和/或软件的形式实现，该版面分析装置可配置于电子设备中。如图3所示，该方法包括：Fig. 3 is a flow chart of a layout analysis method provided by Embodiment 3 of the present invention, the method can be executed by a layout analysis device, the layout analysis device can be implemented in the form of hardware and/or software, and the layout analysis device can be configured in electronic equipment. As shown in Figure 3, the method includes:

S310、获取待分析文本图像，将待分析文本图像输入至文本检测模块，得到文本检测框。S310. Acquire a text image to be analyzed, and input the text image to be analyzed into a text detection module to obtain a text detection frame.

其中，待分析文本图像可以是有版面分析需要的图像。文本检测框可以是文本检测模块识别出的待分析文本图像中的文字区域边框。Wherein, the text image to be analyzed may be an image required for layout analysis. The text detection frame may be a frame of a text area identified by the text detection module in the text image to be analyzed.

在本发明实施例中，可以将获取的待分析文本图像输入至文本检测模块，进而根据文本检测模块对待分析文本图像中的文字区域进行识别，得到文本检测框。In the embodiment of the present invention, the acquired text image to be analyzed can be input to the text detection module, and then the text region in the text image to be analyzed can be identified according to the text detection module to obtain the text detection frame.

S320、根据文本检测框创建第二待分析图，并确定第二待分析图的目标节点特征以及目标边特征。S320. Create a second graph to be analyzed according to the text detection frame, and determine target node features and target edge features of the second graph to be analyzed.

其中，第二待分析图可以是根据各文本检测框之间的相关性绘制的图。目标节点特征可以是构成第二待分析图中节点所具有的特征。目标边特征可以是构成第二待分析图中节点之间的边所具有的特征。第二待分析图中的目标节点对应文本检测框，第二待分析图中连接目标节点的目标边对应所连接目标节点的位置关系以及上下文关系等。目标节点可以是第二待分析图中与文本检测框对应的节点。目标边可以是第二待分析图中连接目标节点的边。Wherein, the second graph to be analyzed may be a graph drawn according to the correlation between text detection frames. The feature of the target node may be the feature of the nodes constituting the second graph to be analyzed. The target edge feature may be a feature of an edge between nodes constituting the second graph to be analyzed. The target node in the second graph to be analyzed corresponds to the text detection frame, and the target edge connecting the target node in the second graph to be analyzed corresponds to the positional relationship and context relationship of the connected target node. The target node may be a node corresponding to the text detection frame in the second graph to be analyzed. The target edge may be an edge connecting the target nodes in the second graph to be analyzed.

在本发明实施例中，可以根据文本检测框在待分析文本图像的位置以及文本检测框的相邻关系，确定文本检测框之间的相关性，从而根据文本检测框之间的相关性绘制第二待分析图，进而确定第二待分析图的目标节点特征以及目标边特征。In the embodiment of the present invention, the correlation between the text detection frames can be determined according to the position of the text detection frames in the text image to be analyzed and the adjacent relationship of the text detection frames, so that the second can be drawn according to the correlation between the text detection frames The second graph to be analyzed, and then determine the target node features and target edge features of the second graph to be analyzed.

S330、将目标节点特征以及目标边特征，输入至目标图神经网络模型，得到目标节点类型以及目标边类型。S330. Input the target node features and target edge features into the target graph neural network model to obtain target node types and target edge types.

其中，目标图神经网络模型可以是本发明任意实施例中的模型训练方法训练得到的模型。目标节点类型可以用于表征文本检测框所属的类别。目标节点类型与文本检测框所属的类别的种类相同。目标边类型可以用于表示第二待分析图中边连接节点是否属于同一个实例，一个实例可以理解为一个段落。Wherein, the target graph neural network model may be a model trained by the model training method in any embodiment of the present invention. The target node type can be used to represent the category to which the text detection box belongs. The target node type is the same kind as the category to which the text detection box belongs. The target edge type can be used to indicate whether the edge-connected nodes in the second graph to be analyzed belong to the same instance, and an instance can be understood as a paragraph.

在本发明实施例中，可以将目标节点特征以及目标边特征，输入至目标图神经网络模型，得到目标节点类型以及目标边类型，从而根据目标节点类型以及目标边类型，确定待分析文本图像中各文本检测框对的版面位置，实现对待分析文本图像的版面分析。In the embodiment of the present invention, the target node features and target edge features can be input to the target graph neural network model to obtain the target node type and target edge type, so as to determine the text image to be analyzed according to the target node type and target edge type. The layout position of each text detection frame pair realizes the layout analysis of the text image to be analyzed.

在本发明的一个可选实施例中，在得到目标节点类型以及目标边类型之后，还可以包括：根据各目标节点的目标节点类型，确定与各目标节点匹配的版面元素类型；根据各目标边的目标边类型，确定与各目标边匹配的实例同属关系；根据与各目标节点匹配的版面元素类型以及与各目标边匹配的实例同属关系，确定待分析文本图像的版面分析结果。In an optional embodiment of the present invention, after obtaining the target node type and the target edge type, it may also include: determining the layout element type matching each target node according to the target node type of each target node; According to the target edge type of each target edge, determine the instance affiliation relationship matching each target edge; according to the layout element type matching each target node and the instance affiliation relationship matching each target edge, determine the layout analysis result of the text image to be analyzed.

其中，版面元素类型可以是构成版面的元素种类。实例同属关系可以用于描述文本检测框是否属于同一个实例，一个实例可以理解为一个段落。版面分析结果可以是对待分析文本图像对应文本版面的分析结果。Wherein, the layout element type may be the type of elements constituting the layout. The instance belonging relationship can be used to describe whether the text detection boxes belong to the same instance, and an instance can be understood as a paragraph. The layout analysis result may be an analysis result of the text layout corresponding to the text image to be analyzed.

在本发明实施例中，可以将各目标节点的目标节点类型，作为与各目标节点匹配的版面元素类型，并根据各目标边的目标边类型，确定与各目标边匹配的实例同属关系，进而根据各目标节点匹配的版面元素类型以及与各目标边匹配的实例同属关系，确定待分析文本图像中文本的布局，得到待分析文本图像的版面分析结果，进一步可基于待分析文本图像的版面分析结果，对待分析文本图像中的文本进行版面布局。In the embodiment of the present invention, the target node type of each target node can be used as the layout element type matching each target node, and according to the target edge type of each target edge, determine the belonging relationship of the instance matching each target edge, and then According to the type of layout elements matched by each target node and the belonging relationship of instances matched with each target edge, the layout of the text in the text image to be analyzed is determined, and the layout analysis result of the text image to be analyzed is obtained, which can further be based on the layout analysis of the text image to be analyzed As a result, the text in the text image to be analyzed is laid out.

图4为本发明实施例三提供的一种版面分析方法的算法逻辑图。如图4所示，首先获取待分析文本图像，进而将待分析文本图像输入至文本检测模块，基于文本检测模块进行文本检测得到文本检测框，从而根据文本检测框构建连通图，即得到第二待分析图，进一步构建第二待分析图的目标节点特征以及目标边特征，并基于目标图神经网络模型对目标节点特征进行更新，得到更新后的目标节点特征，从而基于更新后的目标节点特征对目标节点的节点类型进行分类，进而基于更新后的目标节点特征对目标边特征进行二分类，得到目标边类型，最终可以根据与目标节点的节点类型以及目标边类型确定版面分析结果。FIG. 4 is an algorithm logic diagram of a layout analysis method provided by Embodiment 3 of the present invention. As shown in Figure 4, the text image to be analyzed is first obtained, and then the text image to be analyzed is input to the text detection module, and the text detection frame is obtained based on the text detection module, and the connected graph is constructed according to the text detection frame, that is, the second For the graph to be analyzed, the target node features and target edge features of the second graph to be analyzed are further constructed, and the target node features are updated based on the target graph neural network model to obtain the updated target node features, so that based on the updated target node features Classify the node type of the target node, and then classify the target edge features based on the updated target node features to obtain the target edge type. Finally, the layout analysis result can be determined according to the node type with the target node and the target edge type.

在文档比对时，需要对文档先进行版面分析，再逐个版面元素类型去比对。本发明使用待分析文本图像和OCR(Optical Character Recognition，光学字符识别)之后获得的文本检测框，预测每个文本检测框的类别(即标题、段落、页眉或页脚等)，并且预测临近的文本检测框是否属于同一个实例(即文本框A和文本框B都属于同一个段落)。若是使用目标检测算法，由于输入只有图像，而没有文字检测框的先验知识，若检测框比正确框稍微宽另一点或稍窄一点，可能会导致某一个段落多另了一行或者少了一行，这对于对比任务是很致命的。而使用目标图神经网络模型的算法，从算法设计上，错误的概率大大降低，即使出现了错误样本，简单将其加入训练数据集中，便可以很快拟合，而目标检测的算法很难强制其预测框100％准确。When comparing documents, it is necessary to analyze the layout of the documents first, and then compare them one by one. The present invention uses the text detection frame obtained after the text image to be analyzed and OCR (Optical Character Recognition, Optical Character Recognition), predicts the category of each text detection frame (ie title, paragraph, header or footer, etc.), and predicts the adjacent Whether the text detection boxes belong to the same instance (that is, both text box A and text box B belong to the same paragraph). If the target detection algorithm is used, since the input is only an image, and there is no prior knowledge of the text detection frame, if the detection frame is slightly wider or slightly narrower than the correct frame, it may cause a paragraph to have one more line or one line less , which is fatal for comparison tasks. In terms of algorithm design, the algorithm using the target graph neural network model greatly reduces the probability of error. Even if there is an error sample, it can be quickly fitted by simply adding it to the training data set, while the target detection algorithm is difficult to force Its prediction box is 100% accurate.

在文档解析时，对于双栏甚至多栏的文档，需要对文档先进行版面分析，以获得正确的阅读顺序。本发明判断邻近文本检测框之间的关系，便可快速区分双栏(或者多栏)间的间隙。若是使用Transformer Encoder系列算法，除了对单页文字数量有最大限制外(比如常见的报纸，token数量超过512是很正常的)，在判断token间的关系时，通用做法是全联接计算两两间的得分，这造成不必要的资源消耗，也加大了算法的训练难度。When parsing documents, for double-column or even multi-column documents, it is necessary to analyze the layout of the document first to obtain the correct reading order. The present invention judges the relationship between adjacent text detection frames, and can quickly distinguish the gap between double columns (or multiple columns). If you use the Transformer Encoder series of algorithms, in addition to the maximum limit on the number of words on a single page (such as common newspapers, it is normal for the number of tokens to exceed 512), when judging the relationship between tokens, the general practice is to fully join and calculate two pairs The score, which causes unnecessary resource consumption, also increases the difficulty of algorithm training.

本发明实施例的技术方案，通过获取待分析文本图像，将待分析文本图像输入至文本检测模块，得到文本检测框，从而根据文本检测框创建第二待分析图，并确定第二待分析图的目标节点特征以及目标边特征，进而将目标节点特征以及目标边特征，输入至目标图神经网络模型，得到目标节点类型以及目标边类型。由于目标节点类型以及目标边类型可以反映文本检测框在待分析文本图像的版面布局，通过目标图神经网络模型确定目标节点特征以及目标边特征可以提升版面分析效果，也不存在检测字数受限的约束，解决了现有版面分析方法分析效果较差以及应用局限大的问题，能够提升版面分析性能，并降低应用局限性。In the technical solution of the embodiment of the present invention, by acquiring the text image to be analyzed, inputting the text image to be analyzed into the text detection module to obtain the text detection frame, thereby creating the second image to be analyzed according to the text detection frame, and determining the second image to be analyzed The target node features and target edge features, and then input the target node features and target edge features into the target graph neural network model to obtain the target node type and target edge type. Since the target node type and target edge type can reflect the layout of the text detection frame in the text image to be analyzed, determining the target node features and target edge features through the target graph neural network model can improve the layout analysis effect, and there is no limit to the number of words detected Constraints solve the problems of poor analysis results and large application limitations of existing layout analysis methods, which can improve layout analysis performance and reduce application limitations.

实施例四Embodiment four

图5为本发明实施例四提供的一种模型训练装置的结构示意图。如图5所示，该装置包括：文本检测样本框获取模块410、第一特征确定模块420以及目标图神经网络模型确定模块430；FIG. 5 is a schematic structural diagram of a model training device provided in Embodiment 4 of the present invention. As shown in Figure 5, the device includes: a text detection sample frame acquisition module 410, a first feature determination module 420 and a target graph neural network model determination module 430;

文本检测样本框获取模块410，用于将文本图像样本输入至文本检测模块，得到文本检测样本框；The text detection sample frame acquisition module 410 is used to input the text image sample to the text detection module to obtain the text detection sample frame;

第一特征确定模块420，用于根据文本检测样本框创建第一待分析图，并确定第一待分析图的节点特征以及边特征；其中，第一待分析图中的节点对应文本检测样本框；The first feature determination module 420 is used to create the first graph to be analyzed according to the text detection sample frame, and determine the node features and edge features of the first graph to be analyzed; wherein, the nodes in the first graph to be analyzed correspond to the text detection sample frame ;

目标图神经网络模型确定模块430，用于将节点特征以及边特征，输入至原始图神经网络模型中进行节点类型以及边类型的分类训练，得到目标图神经网络模型；The target graph neural network model determination module 430 is used to input node features and edge features into the original graph neural network model to perform classification training of node types and edge types to obtain the target graph neural network model;

可选的，第一特征确定模块420包括第一待分析图创建单元，用于获取预设构图规则；根据所述预设构图规则以及所述文本检测样本框，创建所述第一待分析图；所述预设构图规则包括全联接规则、第一相邻文本框建边规则或者第二相邻文本框建边规则。Optionally, the first feature determination module 420 includes a first image-to-be-analyzed creating unit, configured to acquire preset composition rules; and create the first image-to-be-analyzed according to the preset composition rules and the text detection sample frame ; The preset composition rules include the full connection rule, the first adjacent text box border building rule or the second adjacent text box border building rule.

可选的，第一特征确定模块420包括第一特征确定单元，用于将各节点对应文本检测样本框的文本框坐标以及文本框图像特征，作为所述第一待分析图的节点特征；将所述第一待分析图中各边连接节点的节点特征、各边匹配的文本框坐标之间的相对距离、各边匹配的文本框长宽比以及各边对应节点的预测类型，作为所述第一待分析图的边特征。Optionally, the first feature determination module 420 includes a first feature determination unit, configured to use the text frame coordinates and text frame image features of each node corresponding to the text detection sample frame as the node features of the first graph to be analyzed; The node characteristics of the connected nodes of each edge in the first graph to be analyzed, the relative distance between the coordinates of the text boxes matched by each edge, the aspect ratio of the text box matched by each edge, and the prediction type of the corresponding node of each edge, as the Edge features of the first graph to be analyzed.

可选的，目标图神经网络模型确定模块430，具体用于将所述节点特征输入至所述原始图神经网络模型中，确定所述第一待分析图中各节点的节点更新特征，并根据所述各节点的节点更新特征对所述原始图神经网络模型进行节点类型训练；根据所述各节点的节点更新特征以及所述边特征，对所述原始图神经网络模型进行边类型的分类训练。Optionally, the target graph neural network model determination module 430 is specifically configured to input the node features into the original graph neural network model, determine the node update features of each node in the first graph to be analyzed, and according to The node update feature of each node performs node type training on the original graph neural network model; according to the node update feature of each node and the edge feature, performs edge type classification training on the original graph neural network model .

本发明实施例所提供的模型训练装置可执行本发明任意实施例所提供的模型训练方法，具备执行方法相应的功能模块和有益效果。The model training device provided in the embodiment of the present invention can execute the model training method provided in any embodiment of the present invention, and has corresponding functional modules and beneficial effects for executing the method.

实施例五Embodiment five

图6为本发明实施例五提供的一种版面分析装置的结构示意图。如图6所示，该装置包括：文本检测框获取模块510、第二特征确定模块520以及分类模块530，FIG. 6 is a schematic structural diagram of a layout analysis device provided in Embodiment 5 of the present invention. As shown in Figure 6, the device includes: a text detection frame acquisition module 510, a second feature determination module 520 and a classification module 530,

文本检测框获取模块510，用于获取待分析文本图像，将所述待分析文本图像输入至文本检测模块，得到文本检测框；The text detection frame acquisition module 510 is used to obtain the text image to be analyzed, and input the text image to be analyzed into the text detection module to obtain the text detection frame;

第二特征确定模块520，用于根据所述文本检测框创建第二待分析图，并确定所述第二待分析图的目标节点特征以及目标边特征；其中，所述第二待分析图中的目标节点对应所述文本检测框；The second feature determination module 520 is configured to create a second graph to be analyzed according to the text detection frame, and determine the target node features and target edge features of the second graph to be analyzed; wherein, the second graph to be analyzed The target node of corresponds to the text detection frame;

分类模块530，用于将所述目标节点特征以及所述目标边特征，输入至目标图神经网络模型，得到目标节点类型以及目标边类型。The classification module 530 is configured to input the target node features and the target edge features into the target graph neural network model to obtain target node types and target edge types.

可选的，版面分析装置还包括版面分析结果确定模块，用于将各目标节点的目标节点类型，作为与各所述目标节点匹配的版面元素类型；根据各目标边的所述目标边类型，确定与各所述目标边匹配的实例同属关系；根据与各所述目标节点匹配的版面元素类型以及与各所述目标边匹配的实例同属关系，确定所述待分析文本图像的版面分析结果。Optionally, the layout analysis device further includes a layout analysis result determination module, configured to use the target node type of each target node as a layout element type matching each target node; according to the target edge type of each target edge, Determine the instance affiliation that matches each target edge; determine the layout analysis result of the text image to be analyzed according to the layout element type that matches each target node and the instance affiliation that matches each target edge.

本发明实施例所提供的版面分析装置可执行本发明任意实施例所提供的版面分析方法，具备执行方法相应的功能模块和有益效果。The layout analysis device provided by the embodiment of the present invention can execute the layout analysis method provided by any embodiment of the present invention, and has corresponding functional modules and beneficial effects for executing the method.

实施例六Embodiment six

图7示出了可以用来实施本发明的实施例的电子设备的结构示意图。电子设备旨在表示各种形式的数字计算机，诸如，膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置，诸如，个人数字处理、蜂窝电话、智能电话、可穿戴设备(如头盔、眼镜、手表等)和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例，并且不意在限制本文中描述的和/或者要求的本发明的实现。FIG. 7 shows a schematic structural diagram of an electronic device that can be used to implement an embodiment of the present invention. Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices (eg, helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the inventions described and/or claimed herein.

如图7所示，电子设备10包括至少一个处理器11，以及与至少一个处理器11通信连接的存储器，如只读存储器(ROM)12、随机访问存储器(RAM)13等，其中，存储器存储有可被至少一个处理器执行的计算机程序，处理器11可以根据存储在只读存储器(ROM)12中的计算机程序或者从存储单元18加载到随机访问存储器(RAM)13中的计算机程序，来执行各种适当的动作和处理。在RAM 13中，还可存储电子设备10操作所需的各种程序和数据。处理器11、ROM 12以及RAM 13通过总线14彼此相连。输入/输出(I/O)接口15也连接至总线14。As shown in FIG. 7 , the electronic device 10 includes at least one processor 11, and a memory connected in communication with the at least one processor 11, such as a read-only memory (ROM) 12, a random access memory (RAM) 13, etc., wherein the memory stores There is a computer program executable by at least one processor, and the processor 11 can operate according to a computer program stored in a read-only memory (ROM) 12 or loaded from a storage unit 18 into a random access memory (RAM) 13. Various appropriate actions and processes are performed. In the RAM 13, various programs and data necessary for the operation of the electronic device 10 are also stored. The processor 11 , ROM 12 , and RAM 13 are connected to each other through a bus 14 . An input/output (I/O) interface 15 is also connected to the bus 14 .

电子设备10中的多个部件连接至I/O接口15，包括：输入单元16，例如键盘、鼠标等；输出单元17，例如各种类型的显示器、扬声器等；存储单元18，例如磁盘、光盘等；以及通信单元19，例如网卡、调制解调器、无线通信收发机等。通信单元19允许电子设备10通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16, such as a keyboard, a mouse, etc.; an output unit 17, such as various types of displays, speakers, etc.; a storage unit 18, such as a magnetic disk, an optical disk etc.; and a communication unit 19, such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.

处理器11可以是各种具有处理和计算能力的通用和/或专用处理组件。处理器11的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的处理器、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。处理器11执行上文所描述的各个方法和处理，例如模型训练方法，或者版面分析方法。Processor 11 may be various general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various processors that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. The processor 11 executes various methods and processes described above, such as a model training method, or a layout analysis method.

在一些实施例中，模型训练方法，或者版面分析方法可被实现为计算机程序，其被有形地包含于计算机可读存储介质，例如存储单元18。在一些实施例中，计算机程序的部分或者全部可以经由ROM 12和/或通信单元19而被载入和/或安装到电子设备10上。当计算机程序加载到RAM 13并由处理器11执行时，可以执行上文描述的模型训练方法，或者版面分析方法的一个或多个步骤。备选地，在其他实施例中，处理器11可以通过其他任何适当的方式(例如，借助于固件)而被配置为执行模型训练方法，或者版面分析方法。In some embodiments, the model training method, or the layout analysis method can be implemented as a computer program, which is tangibly embodied in a computer-readable storage medium, such as the storage unit 18 . In some embodiments, part or all of the computer program may be loaded and/or installed on the electronic device 10 via the ROM 12 and/or the communication unit 19 . When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the model training method or the layout analysis method described above can be executed. Alternatively, in other embodiments, the processor 11 may be configured in any other appropriate way (for example, by means of firmware) to execute a model training method or a layout analysis method.

本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括：实施在一个或者多个计算机程序中，该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释，该可编程处理器可以是专用或者通用可编程处理器，可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令，并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.

用于实施本发明的方法的计算机程序可以采用一个或多个编程语言的任何组合来编写。这些计算机程序可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器，使得计算机程序当由处理器执行时使流程图和/或框图中所规定的功能/操作被实施。计算机程序可以完全在机器上执行、部分地在机器上执行，作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Computer programs for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, so that the computer program causes the functions/operations specified in the flowcharts and/or block diagrams to be implemented when executed by the processor. A computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

在本发明的上下文中，计算机可读存储介质可以是有形的介质，其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的计算机程序。计算机可读存储介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备，或者上述内容的任何合适组合。备选地，计算机可读存储介质可以是机器可读信号介质。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present invention, a computer readable storage medium may be a tangible medium that may contain or store a computer program for use by or in conjunction with an instruction execution system, apparatus or device. A computer readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. Alternatively, a computer readable storage medium may be a machine readable signal medium. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

为了提供与用户的交互，可以在电子设备上实施此处描述的系统和技术，该电子设备具有：用于向用户显示信息的显示装置(例如，CRT(阴极射线管)或者LCD(液晶显示器)监视器)；以及键盘和指向装置(例如，鼠标或者轨迹球)，用户可以通过该键盘和该指向装置来将输入提供给电子设备。其它种类的装置还可以用于提供与用户的交互；例如，提供给用户的反馈可以是任何形式的传感反馈(例如，视觉反馈、听觉反馈、或者触觉反馈)；并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。In order to provide interaction with the user, the systems and techniques described herein can be implemented on an electronic device having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display)) for displaying information to the user. monitor); and a keyboard and pointing device (eg, a mouse or a trackball) through which the user can provide input to the electronic device. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如，作为数据服务器)、或者包括中间件部件的计算系统(例如，应用服务器)、或者包括前端部件的计算系统(例如，具有图形用户界面或者网络浏览器的用户计算机，用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如，通信网络)来将系统的部件相互连接。通信网络的示例包括：局域网(LAN)、广域网(WAN)、区块链网络和互联网。The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: local area networks (LANs), wide area networks (WANs), blockchain networks, and the Internet.

计算系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器，又称为云计算服务器或云主机，是云计算服务体系中的一项主机产品，以解决了传统物理主机与VPS服务中，存在的管理难度大，业务扩展性弱的缺陷。A computing system can include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also known as a cloud computing server or a cloud host. It is a host product in the cloud computing service system to solve the problems of difficult management and weak business expansion in traditional physical hosts and VPS services. defect.

应该理解，可以使用上面所示的各种形式的流程，重新排序、增加或删除步骤。例如，本发明中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行，只要能够实现本发明的技术方案所期望的结果，本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, each step described in the present invention may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution of the present invention can be achieved, there is no limitation herein.

上述具体实施方式，并不构成对本发明保护范围的限制。本领域技术人员应该明白的是，根据设计要求和其他因素，可以进行各种修改、组合、子组合和替代。任何在本发明的精神和原则之内所作的修改、等同替换和改进等，均应包含在本发明保护范围之内。The above specific implementation methods do not constitute a limitation to the protection scope of the present invention. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

1. A method of model training, comprising:

inputting the text image sample to a text detection module to obtain a text detection sample box;

creating a first graph to be analyzed according to the text detection sample box, and determining node characteristics and edge characteristics of the first graph to be analyzed; the nodes in the first graph to be analyzed correspond to the text detection sample box;

inputting the node characteristics and the edge characteristics into an original graph neural network model to perform classification training of node types and edge types, so as to obtain a target graph neural network model;

The target graph neural network model is used for carrying out layout analysis on texts in the text images.

2. The method of claim 1, wherein creating a first graph to be analyzed from the text detection sample box comprises:

acquiring a preset composition rule;

creating the first diagram to be analyzed according to the preset composition rule and the text detection sample box;

the preset composition rule comprises a full-connection rule, a first adjacent text box edge building rule or a second adjacent text box edge building rule.

3. The method of claim 1, wherein the determining node features and edge features of the first graph to be analyzed comprises:

using text box coordinates and text box image characteristics of the text detection sample boxes corresponding to the nodes as node characteristics of the first to-be-analyzed graph;

and taking the node characteristics of the connecting nodes of the edges, the relative distance between the coordinates of the text boxes matched with the edges, the length-width ratio of the text boxes matched with the edges and the prediction type of the corresponding nodes of the edges in the first graph to be analyzed as the edge characteristics of the first graph to be analyzed.

4. The method according to claim 1, wherein inputting the node features and the edge features into an original graph neural network model for classification training of node types and edge types comprises:

Inputting the node characteristics into the original graph neural network model, determining node updating characteristics of each node in the first graph to be analyzed, and performing node type training on the original graph neural network model according to the node updating characteristics of each node;

and according to the node updating characteristics of each node and the edge characteristics, carrying out edge type classification training on the original graph neural network model.

5. A layout analysis method, comprising:

acquiring a text image to be analyzed, and inputting the text image to be analyzed into a text detection module to obtain a text detection box;

creating a second graph to be analyzed according to the text detection box, and determining target node characteristics and target edge characteristics of the second graph to be analyzed; the target node in the second graph to be analyzed corresponds to the text detection box;

inputting the target node characteristics and the target edge characteristics into a target graph neural network model to obtain a target node type and a target edge type;

wherein the target graph neural network model is a model obtained by training the model training method according to any one of claims 1 to 4.

6. The method of claim 5, further comprising, after the obtaining the target node type and the target edge type:

taking the target node type of each target node as the layout element type matched with each target node;

determining an instance affiliation matched with each target edge according to the target edge type of each target edge;

and determining the layout analysis result of the text image to be analyzed according to the layout element type matched with each target node and the generic relationship of the instance matched with each target edge.

7. A model training device, comprising:

the text detection sample box acquisition module is used for inputting the text image sample into the text detection module to obtain a text detection sample box;

the first feature determining module is used for creating a first graph to be analyzed according to the text detection sample box and determining node features and edge features of the first graph to be analyzed; the nodes in the first graph to be analyzed correspond to the text detection sample box;

the target graph neural network model determining module is used for inputting the node characteristics and the edge characteristics into an original graph neural network model to perform classification training of node types and edge types, so as to obtain a target graph neural network model;

8. A layout analysis apparatus, comprising:

the text detection box acquisition module is used for acquiring a text image to be analyzed, and inputting the text image to be analyzed into the text detection module to obtain a text detection box;

the second feature determining module is used for creating a second graph to be analyzed according to the text detection box and determining target node features and target edge features of the second graph to be analyzed; the target node in the second graph to be analyzed corresponds to the text detection box;

and the classification module is used for inputting the target node characteristics and the target edge characteristics into a target graph neural network model to obtain a target node type and a target edge type.

9. An electronic device, the electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the model training method of any one of claims 1-4 or to perform the layout analysis method of any one of claims 5-6.

10. A computer readable storage medium storing computer instructions for causing a processor to implement the model training method of any one of claims 1-4 or the layout analysis method of any one of claims 5-6 when executed.