CN116229490A - Layout analysis method, device, equipment and medium of graphic neural network - Google Patents

Layout analysis method, device, equipment and medium of graphic neural network Download PDF

Info

Publication number
CN116229490A
CN116229490A CN202310246347.3A CN202310246347A CN116229490A CN 116229490 A CN116229490 A CN 116229490A CN 202310246347 A CN202310246347 A CN 202310246347A CN 116229490 A CN116229490 A CN 116229490A
Authority
CN
China
Prior art keywords
node
target
text
graph
analyzed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310246347.3A
Other languages
Chinese (zh)
Inventor
魏舒
陈运文
纪达麒
李巍豪
高翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Datagrand Information Technology Shanghai Co ltd
Original Assignee
Datagrand Information Technology Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Datagrand Information Technology Shanghai Co ltd filed Critical Datagrand Information Technology Shanghai Co ltd
Priority to CN202310246347.3A priority Critical patent/CN116229490A/en
Publication of CN116229490A publication Critical patent/CN116229490A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a layout analysis method, device, equipment and medium of a graph neural network. A model training method comprising: inputting the text image sample to a text detection module to obtain a text detection sample box; creating a first graph to be analyzed according to the text detection sample box, and determining node characteristics and edge characteristics of the first graph to be analyzed; inputting the node characteristics and the edge characteristics into an original graph neural network model to perform classification training of node types and edge types, so as to obtain a target graph neural network model; the target graph neural network model is used for carrying out layout analysis on texts in the text images. The technical scheme of the embodiment of the invention can improve the layout analysis performance and reduce the application limitation.

Description

Layout analysis method, device, equipment and medium of graphic neural network
Technical Field
The present invention relates to the field of layout analysis technologies, and in particular, to a layout analysis method, apparatus, device, and medium for a graphic neural network.
Background
The layout analysis task is widely applied in actual life, for example, when an image format document is converted into a text format, an accurate layout analysis result is needed, and then follow-up tasks such as a document comparison task, an information extraction task and the like are performed based on the layout analysis result.
Currently, one of the common layout analysis schemes is a target detection method, and the other is a method using Transformer Encoder series. When the layout analysis is performed based on the target detection mode, the obtained detection frame is not very accurate, the problem of overlapping of the detection frames is difficult to solve, and a post-processing rule is needed to solve the matching relationship with the characters, so that the layout analysis effect is poor. However, transformer Encoder series of methods has a limitation on the maximum length of the number of words, in the actual situation, the number of documents Token of one page exceeds 512 or even 1024, but Transformer Encoder series of models are large, the operation resource is high, a specific Token needs to be adapted, the relevance between the documents is high, different languages cannot be directly migrated, and a large number of data sets disclosed in the field are English data sets, such as DocBank, pubLayNet are data sets with hundreds of thousands of levels, and the number of Chinese or other languages is small, even if any, the condition of poor labeling quality exists.
Disclosure of Invention
The invention provides a layout analysis method, device, equipment and medium of a graphic neural network, which are used for solving the problems of poor analysis effect and large application limitation of the existing layout analysis method.
According to an aspect of the present invention, there is provided a model training method including:
inputting the text image sample to a text detection module to obtain a text detection sample box;
creating a first graph to be analyzed according to the text detection sample box, and determining node characteristics and edge characteristics of the first graph to be analyzed; the nodes in the first graph to be analyzed correspond to text detection sample boxes;
inputting the node characteristics and the edge characteristics into an original graph neural network model to perform classification training of node types and edge types, so as to obtain a target graph neural network model;
the target graph neural network model is used for carrying out layout analysis on texts in the text images.
According to another aspect of the present invention, there is provided a layout analysis method including: acquiring a text image to be analyzed, and inputting the text image to be analyzed into a text detection module to obtain a text detection box;
creating a second graph to be analyzed according to the text detection box, and determining target node characteristics and target edge characteristics of the second graph to be analyzed; the target node in the second graph to be analyzed corresponds to the text detection box;
inputting the target node characteristics and the target edge characteristics into a target graph neural network model to obtain a target node type and a target edge type;
The target graph neural network model is a model obtained by training a model training method in any embodiment.
According to another aspect of the present invention, there is provided a model training apparatus comprising:
the text detection sample box acquisition module is used for inputting the text image sample into the text detection module to obtain a text detection sample box;
the first feature determining module is used for creating a first graph to be analyzed according to the text detection sample box and determining node features and edge features of the first graph to be analyzed; the nodes in the first graph to be analyzed correspond to text detection sample boxes;
the target graph neural network model determining module is used for inputting the node characteristics and the edge characteristics into the original graph neural network model to perform classification training of the node types and the edge types, so as to obtain a target graph neural network model;
the target graph neural network model is used for carrying out layout analysis on texts in the text images.
According to another aspect of the present invention, there is provided a layout analysis apparatus comprising:
the text detection box acquisition module is used for acquiring a text image to be analyzed, and inputting the text image to be analyzed into the text detection module to obtain a text detection box;
The second feature determining module is used for creating a second graph to be analyzed according to the text detection box and determining target node features and target edge features of the second graph to be analyzed; the target node in the second graph to be analyzed corresponds to the text detection box;
the classification module is used for inputting the target node characteristics and the target edge characteristics into the target graph neural network model to obtain a target node type and a target edge type;
the target graph neural network model is a model obtained by training a model training method in any embodiment.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the model training method, or layout analysis method, of any of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the model training method, or layout analysis method, according to any of the embodiments of the present invention when executed.
According to the technical scheme, the text image sample is input to the text detection module to obtain the text detection sample box, so that a first graph to be analyzed is created according to the text detection sample box, node characteristics and edge characteristics of the first graph to be analyzed are determined, the node characteristics and the edge characteristics are further input to the original graph neural network model to conduct classification training of node types and edge types, a target graph neural network model is obtained, and layout analysis of texts in the text image is achieved by the target graph neural network model. The node type and the edge type can reflect the layout of the text detection sample frame in the text image sample, and the node type and the edge type are classified and trained in the original graph neural network model through the extracted node characteristics and the extracted edge characteristics, so that the finally obtained target graph neural network model has better layout analysis effect, the problems of poor analysis effect and large application limitation of the existing layout analysis method are solved, the layout analysis performance can be improved, and the application limitation is reduced.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a model training method according to a first embodiment of the present invention;
FIG. 2 is a flowchart of a model training method according to a second embodiment of the present invention;
FIG. 3 is a flowchart of a layout analysis method according to a third embodiment of the present invention;
FIG. 4 is an algorithm chart of a layout analysis method according to a third embodiment of the present invention;
fig. 5 is a schematic structural diagram of a model training device according to a fourth embodiment of the present invention;
fig. 6 is a schematic structural diagram of a layout analysis device according to a fifth embodiment of the present invention;
fig. 7 shows a schematic diagram of an electronic device that may be used to implement an embodiment of the invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a model training method according to an embodiment of the present invention, where the method may be implemented by a model training device, and the model training device may be implemented in hardware and/or software, and the model training device may be configured in an electronic device. As shown in fig. 1, the method includes:
S110, inputting the text image sample into a text detection module to obtain a text detection sample box.
The text image sample may be an image sample for layout analysis training. By way of example, text image samples may include, but are not limited to, notes, newspapers, magazines, and the like. The text detection module may be any model with text detection function. The text detection module may be used to detect text regions in an image. The text detection sample box may be a text region border in the text image sample identified by the text detection module.
According to the embodiment of the invention, the text image sample can be obtained according to the layout analysis requirement, and then the text image sample is input into the text detection module, so that the text detection sample box is obtained by identifying the text region in the text image sample according to the text detection module.
S120, creating a first graph to be analyzed according to the text detection sample box, and determining node characteristics and edge characteristics of the first graph to be analyzed.
The first graph to be analyzed may be a graph drawn according to the correlation between the text detection sample boxes. The nodes in the first graph to be analyzed correspond to text detection sample boxes, the edges of the connecting nodes in the first graph to be analyzed correspond to the position relations, the context relations and the like of the connected nodes. The node characteristic may be a characteristic of a node constituting the first graph to be analyzed. The edge features may be features that constitute edges between nodes in the first graph to be analyzed.
In the embodiment of the invention, the correlation between the text detection sample boxes can be determined according to the positions of the text detection sample boxes in the text image samples and the adjacent relation of the text detection sample boxes, so that a first graph to be analyzed is drawn according to the correlation between the text detection sample boxes, the node characteristics of the first graph to be analyzed and the edge characteristics of edges between the nodes in the first graph to be analyzed are further determined, and the label data corresponding to the node characteristics of the first graph to be analyzed and the label data corresponding to the edge characteristics of the first graph to be analyzed are further acquired.
And S130, inputting the node characteristics and the edge characteristics into the original graph neural network model to perform classification training of the node types and the edge types, and obtaining the target graph neural network model.
The original graph neural network model can be any known graph neural network model. The target graph neural network model can be used for layout analysis of text in a text image. The node type may be used to characterize the class to which the node corresponds to the text detection sample box. The category to which the text detection sample box belongs can be understood as the type of element constituting the layout, such as a title, a paragraph, a header, a footer, or the like. The edge type can be used for representing the same generic relationship of the edge connection nodes in the first graph to be analyzed, and can assist in determining whether the text detection sample boxes corresponding to the nodes belong to the same paragraph.
In the embodiment of the invention, node characteristics, edge characteristics, label data corresponding to the node characteristics and label data corresponding to the edge characteristics can be input into an original graph neural network model, so that classification training of node types is carried out on the original graph neural network model based on the node characteristics and the label data corresponding to the node characteristics to obtain updated node characteristics, and further, classification training of edge types is carried out on the original graph neural network model according to the updated node characteristics, the input edge characteristics and the label data corresponding to the edge characteristics to obtain a target graph neural network model, so that layout analysis of texts in text images can be carried out through the target graph neural network model.
According to the technical scheme, the text image sample is input to the text detection module to obtain the text detection sample box, so that a first graph to be analyzed is created according to the text detection sample box, node characteristics and edge characteristics of the first graph to be analyzed are determined, the node characteristics and the edge characteristics are further input to the original graph neural network model to conduct classification training of node types and edge types, a target graph neural network model is obtained, and layout analysis of texts in the text image is achieved by the target graph neural network model. The node type and the edge type can reflect the layout of the text detection sample frame in the text image sample, and the node type and the edge type are classified and trained in the original graph neural network model through the extracted node characteristics and the extracted edge characteristics, so that the finally obtained target graph neural network model has better layout analysis effect, the problems of poor analysis effect and large application limitation of the existing layout analysis method are solved, the layout analysis performance can be improved, and the application limitation is reduced.
Example two
Fig. 2 is a flowchart of a model training method provided by a second embodiment of the present invention, which is implemented based on the foregoing embodiment, and provides a specific optional implementation manner of inputting node features and edge features into an original graph neural network model to perform classification training of node types and edge types. As shown in fig. 2, the method includes:
s210, inputting the text image sample to a text detection module to obtain a text detection sample box.
S220, creating a first graph to be analyzed according to the text detection sample box, and determining node characteristics and edge characteristics of the first graph to be analyzed.
In an alternative embodiment of the present invention, creating the first graph to be analyzed according to the text detection sample box may include: acquiring a preset composition rule; creating a first diagram to be analyzed according to a preset composition rule and a text detection sample box; the preset composition rules may include a full join rule, a first adjacent text box edge rule, or a second adjacent text box edge rule.
The preset composition rule may be a composition rule of a preset connected graph. The full join rule may be a rule that creates a join edge between any two nodes in the connected graph. The first adjacent text box edge rule may be a rule that creates a connecting edge for an adjacent node using Beta-skeleton. The second adjacent text box edge rule may be a rule that creates a connecting edge between K nodes nearest to the nearest using KNN (K-nearest neighbor).
In the embodiment of the invention, according to the diagram construction requirement of the first diagram to be analyzed, a preset composition rule is determined from the rules (such as a full-connection rule, a first adjacent text box side construction rule or a second adjacent text box side construction rule) of the existing connected diagram node construction side, so that the node corresponding to the text detection sample box is set, and the node corresponding to the text detection sample box is connected through the preset composition rule to obtain the first diagram to be analyzed.
S230, inputting the node characteristics into the original graph neural network model, determining node update characteristics of each node in the first graph to be analyzed, and performing node type training on the original graph neural network model according to the node update characteristics of each node.
The node update feature may be a new feature of a node determined by the original graph neural network model according to an intersection node in the first graph to be analyzed. Illustratively, assuming that there is an edge between node 1 and node 2 and an edge between node 1 and node 3, node 1 is the intersection node of node 2 and node 3.
In the embodiment of the invention, after the node characteristics are input into the original graph neural network model, the node characteristics of the first graph to be analyzed are updated according to the node characteristics of the nodes associated with each intersection node respectively by utilizing the original graph neural network model, so that the node update characteristics of each node in the first graph to be analyzed are obtained, and the node type identification training is carried out on the original graph neural network model according to the node update characteristics of each node and the label data corresponding to the node characteristics.
In an alternative embodiment of the present invention, determining the node feature and the edge feature of the first graph to be analyzed may include: the text box coordinates and the text box image characteristics of the text detection sample box corresponding to each node are used as node characteristics of a first graph to be analyzed; and taking the node characteristics of the connecting nodes of the edges, the relative distance between the coordinates of the text boxes matched with the edges, the length-width ratio of the text boxes matched with the edges and the prediction type of the corresponding nodes of the edges in the first graph to be analyzed as the edge characteristics of the first graph to be analyzed.
The text box coordinates may be coordinates of the text detection sample box on the text image sample. The text box coordinates may include, but are not limited to, coordinates of the upper left corner and the lower right corner of the text detection sample box. The text block image feature may be a local image of the text detection sample box corresponding to the text image sample. The text box aspect ratio may be the edge length ratio of the long side to the wide side of the text detection sample box.
In the embodiment of the invention, according to the position of the text detection sample frame corresponding to each node in the first graph to be analyzed in the text image sample, the text frame coordinates and the text frame image characteristics of the text detection sample frame corresponding to each node are obtained, the text frame coordinates and the text frame image characteristics of the text detection sample frame corresponding to each node are further used as the node characteristics of the first graph to be analyzed, the node characteristics of each side connection node, the relative distance between the text frame coordinates of each side match (namely the relative distance between the text frame coordinates of each side connection node corresponding to the text detection sample frame) and the length-width ratio of each side match (namely the length-to-width ratio of each side connection node corresponding to the text detection sample frame) of each side are further respectively predicted, so that the predicted type of each side connection node is obtained, and the node characteristics of each side connection node, the relative distance between the text frame coordinates of each side match, the predicted type of each side connection node are further used as the node characteristics of the first graph to be analyzed.
S240, according to the node updating characteristics and the edge characteristics of each node, carrying out edge type classification training on the original graph neural network model to obtain the target graph neural network model.
According to the embodiment of the invention, the two-classification training of the edge type can be carried out on the original graph neural network model according to the node updating characteristics, the edge characteristics and the label data corresponding to the edge characteristics of each node, so that the target graph neural network model is obtained, and the layout analysis is carried out on the text in the text image through the target graph neural network model.
In a specific example, a document image (text image sample) may be input to the text detection module to obtain a text detection sample box, so that the text detection sample box is used as a node, and a first graph to be analyzed is constructed according to a preset composition rule (such as any one of a full-join rule, a first adjacent text box edge rule, or a second adjacent text box edge rule).
After the first diagram to be analyzed is obtained, node characteristics and edge characteristics of the first diagram to be analyzed can be constructed, specifically, text box coordinates of text detection sample boxes corresponding to all nodes in the first diagram to be analyzed can be determined, and if the text detection sample boxes are not rectangles, the coordinate points of the text box coordinates can be expanded. And inputting the document image into the convolutional neural network to obtain the image characteristics of the document image, and further obtaining the local image characteristics of the text detection sample box corresponding to the node according to the image characteristics of the document image and the text box coordinates. Since each edge connects two nodes, representing whether two text detection sample boxes have a relationship, i.e., belong to one instance, edge features can facilitate layout analysis. And further, inputting the node characteristics and the edge characteristics into the original graph neural network model to perform classification training of the node types and the edge types, so as to obtain the target graph neural network model.
The original graph neural network model may include, but is not limited to, GCN, GAT, GAT-v2, DGCNN, gravNet, and the like. The original graph neural network model can be stacked in multiple layers and is provided with two classification heads, one classification head of the original graph neural network model carries out classification training of node types, and the other classification head can be used for classification training of edge types. And constructing a final original graph neural network model based on the parts, and adjusting the selection of each part on different data sets to achieve the configuration with the best effect.
According to the technical scheme, the text image sample is input to the text detection module to obtain the text detection sample box, so that a first graph to be analyzed is created according to the text detection sample box, node characteristics and edge characteristics of the first graph to be analyzed are determined, the node characteristics are input to the original graph neural network model, node update characteristics of all nodes in the first graph to be analyzed are determined, node type training is conducted on the original graph neural network model according to the node update characteristics of all nodes, edge type classification training is conducted on the original graph neural network model according to the node update characteristics of all nodes and the edge types, a target graph neural network model is obtained, and layout analysis of texts in the text image is achieved by the target graph neural network model. The node type and the edge type can reflect the layout of the text detection sample frame in the text image sample, and the node type and the edge type are classified and trained in the original graph neural network model through the extracted node characteristics and the extracted edge characteristics, so that the finally obtained target graph neural network model has better layout analysis effect, the problems of poor analysis effect and large application limitation of the existing layout analysis method are solved, the layout analysis performance can be improved, and the application limitation is reduced.
The scheme uses the original graph neural network model as an algorithm core, has small calculated amount, reasonably sets node characteristics and edge characteristics in the first graph to be analyzed, can achieve good algorithm effect, does not use text information, is irrelevant to language, can directly use a large amount of public English data sets, is not limited to specific languages, does not have the limitation of the number of single page characters, does not need to match text boxes and carry out complex subsequent processing on overlapped target detection boxes, has great advantages in overall performance (parameter number, display memory occupation, reasoning time and accuracy rate) compared with the prior scheme, and is suitable for industrialized landing.
Example III
Fig. 3 is a flowchart of a layout analysis method according to a third embodiment of the present invention, where the method may be performed by a layout analysis apparatus, and the layout analysis apparatus may be implemented in hardware and/or software, and the layout analysis apparatus may be configured in an electronic device. As shown in fig. 3, the method includes:
s310, acquiring a text image to be analyzed, and inputting the text image to be analyzed into a text detection module to obtain a text detection box.
The text image to be analyzed can be an image with layout analysis requirements. The text detection box can be a text region frame in the text image to be analyzed, which is identified by the text detection module.
In the embodiment of the invention, the acquired text image to be analyzed can be input into the text detection module, and the text detection box is obtained by identifying the text region in the text image to be analyzed according to the text detection module.
S320, creating a second graph to be analyzed according to the text detection box, and determining target node characteristics and target edge characteristics of the second graph to be analyzed.
The second graph to be analyzed may be a graph drawn according to the correlation between the text detection boxes. The target node characteristic may be a characteristic of a node in the second graph to be analyzed. The target edge feature may be a feature that constitutes an edge between nodes in the second graph to be analyzed. And the target node in the second graph to be analyzed corresponds to a text detection box, and the target edge of the second graph to be analyzed, which is connected with the target node, corresponds to the position relationship, the context relationship and the like of the connected target node. The target node may be a node in the second graph to be analyzed corresponding to the text detection box. The target edge may be an edge of the second graph to be analyzed that connects the target node.
In the embodiment of the invention, the correlation between the text detection boxes can be determined according to the positions of the text detection boxes in the text image to be analyzed and the adjacent relation of the text detection boxes, so that a second graph to be analyzed is drawn according to the correlation between the text detection boxes, and further the target node characteristics and the target edge characteristics of the second graph to be analyzed are determined.
S330, inputting the target node characteristics and the target edge characteristics into a target graph neural network model to obtain the target node type and the target edge type.
The target graph neural network model can be a model obtained by training by the model training method in any embodiment of the invention. The target node type may be used to characterize the category to which the text detection box belongs. The target node type is the same as the category to which the text detection box belongs. The target edge type may be used to indicate whether the edge connection nodes in the second graph to be analyzed belong to the same instance, and an instance may be understood as a paragraph.
In the embodiment of the invention, the target node characteristics and the target edge characteristics can be input into the target graph neural network model to obtain the target node type and the target edge type, so that the layout positions of each text detection box pair in the text image to be analyzed are determined according to the target node type and the target edge type, and the layout analysis of the text image to be analyzed is realized.
In an optional embodiment of the present invention, after obtaining the target node type and the target edge type, the method may further include: determining the type of the layout element matched with each target node according to the type of the target node of each target node; determining the same relationship of the instance matched with each target edge according to the type of the target edge of each target edge; and determining the layout analysis result of the text image to be analyzed according to the layout element types matched with each target node and the generic relationship of the examples matched with each target edge.
The layout element type may be an element type constituting a layout. Instance affiliations may be used to describe whether text detection boxes belong to the same instance, an instance may be understood as a paragraph. The layout analysis result may be an analysis result of a text layout corresponding to the text image to be analyzed.
In the embodiment of the invention, the target node type of each target node can be used as the layout element type matched with each target node, the same generic relation with the example matched with each target side is determined according to the target side type of each target side, and then the layout of the text in the text image to be analyzed is determined according to the layout element type matched with each target node and the same generic relation with the example matched with each target side, so that the layout analysis result of the text image to be analyzed is obtained, and the layout of the text in the text image to be analyzed can be further performed based on the layout analysis result of the text image to be analyzed.
Fig. 4 is an algorithm logic diagram of a layout analysis method according to a third embodiment of the present invention. As shown in fig. 4, a text image to be analyzed is obtained first, then the text image to be analyzed is input to a text detection module, text detection is performed based on the text detection module to obtain a text detection box, a connected graph is constructed according to the text detection box, a second graph to be analyzed is obtained, target node characteristics and target edge characteristics of the second graph to be analyzed are further constructed, the target node characteristics are updated based on a target graph neural network model, updated target node characteristics are obtained, node types of target nodes are classified based on the updated target node characteristics, target edge characteristics are classified based on the updated target node characteristics, and target edge types are obtained, and finally layout analysis results can be determined according to the node types and the target edge types of the target nodes.
When the documents are compared, the documents are required to be subjected to layout analysis, and then the comparison is carried out one by one layout element type. The present invention predicts the category (i.e., title, paragraph, header, footer, etc.) of each text detection box using the text detection box obtained after the text image to be analyzed and OCR (Optical Character Recognition ), and predicts whether the neighboring text detection boxes belong to the same instance (i.e., text box a and text box B both belong to the same paragraph). If the target detection algorithm is used, since only the image is input and no priori knowledge of the text detection frame exists, if the detection frame is slightly wider than the correct frame by another point or slightly narrower than the correct frame, one paragraph is possibly caused to be one more line or one less line, which is fatal to the comparison task. The algorithm of the target graph neural network model is used, so that the probability of errors is greatly reduced from the aspect of algorithm design, even if an error sample occurs, the error sample is simply added into a training data set, the error sample can be quickly fitted, and the prediction frame of the target detection algorithm is difficult to force to be 100% accurate.
When the document is analyzed, layout analysis is needed to be carried out on the document first for the double-column or even multi-column document so as to obtain the correct reading sequence. The invention judges the relation between adjacent text detection boxes, and can rapidly distinguish the gap between double columns (or multiple columns). If Transformer Encoder series algorithms are used, besides the maximum limit on the number of single-page characters (for example, common newspapers, the number of tokens exceeds 512, which is quite normal), when judging the relation among tokens, the general method is to calculate scores between every two through full connection, which causes unnecessary resource consumption and increases the training difficulty of the algorithms.
According to the technical scheme, the text image to be analyzed is acquired and is input to the text detection module to obtain the text detection box, so that a second graph to be analyzed is created according to the text detection box, the target node characteristics and the target edge characteristics of the second graph to be analyzed are determined, and then the target node characteristics and the target edge characteristics are input to the target graph neural network model to obtain the target node type and the target edge type. The target node type and the target edge type can reflect the layout of the text detection box in the text image to be analyzed, and the target node characteristics and the target edge characteristics can be determined through the target graph neural network model, so that the layout analysis effect can be improved, the constraint that the number of detection words is limited is avoided, the problems that the existing layout analysis method is poor in analysis effect and large in application limitation are solved, the layout analysis performance can be improved, and the application limitation is reduced.
Example IV
Fig. 5 is a schematic structural diagram of a model training device according to a fourth embodiment of the present invention. As shown in fig. 5, the apparatus includes: a text detection sample box acquisition module 410, a first feature determination module 420, and a target graph neural network model determination module 430;
A text detection sample box acquisition module 410, configured to input a text image sample to the text detection module to obtain a text detection sample box;
the first feature determining module 420 is configured to create a first graph to be analyzed according to the text detection sample box, and determine node features and edge features of the first graph to be analyzed; the nodes in the first graph to be analyzed correspond to text detection sample boxes;
the target graph neural network model determining module 430 is configured to input the node features and the edge features into the original graph neural network model to perform classification training of the node types and the edge types, so as to obtain a target graph neural network model;
the target graph neural network model is used for carrying out layout analysis on texts in the text images.
According to the technical scheme, the text image sample is input to the text detection module to obtain the text detection sample box, so that a first graph to be analyzed is created according to the text detection sample box, node characteristics and edge characteristics of the first graph to be analyzed are determined, the node characteristics and the edge characteristics are further input to the original graph neural network model to conduct classification training of node types and edge types, a target graph neural network model is obtained, and layout analysis of texts in the text image is achieved by the target graph neural network model. The node type and the edge type can reflect the layout of the text detection sample frame in the text image sample, and the node type and the edge type are classified and trained in the original graph neural network model through the extracted node characteristics and the extracted edge characteristics, so that the finally obtained target graph neural network model has better layout analysis effect, the problems of poor analysis effect and large application limitation of the existing layout analysis method are solved, the layout analysis performance can be improved, and the application limitation is reduced.
Optionally, the first feature determining module 420 includes a first to-be-analyzed graph creating unit, configured to obtain a preset composition rule; creating the first diagram to be analyzed according to the preset composition rule and the text detection sample box; the preset composition rule comprises a full-connection rule, a first adjacent text box edge building rule or a second adjacent text box edge building rule.
Optionally, the first feature determining module 420 includes a first feature determining unit, configured to use coordinates of a text box and image features of the text box corresponding to each node as node features of the first to-be-analyzed graph; and taking the node characteristics of the connecting nodes of the edges, the relative distance between the coordinates of the text boxes matched with the edges, the length-width ratio of the text boxes matched with the edges and the prediction type of the corresponding nodes of the edges in the first graph to be analyzed as the edge characteristics of the first graph to be analyzed.
Optionally, the target graph neural network model determining module 430 is specifically configured to input the node characteristics into the original graph neural network model, determine node update characteristics of each node in the first graph to be analyzed, and perform node type training on the original graph neural network model according to the node update characteristics of each node; and according to the node updating characteristics of each node and the edge characteristics, carrying out edge type classification training on the original graph neural network model.
The model training device provided by the embodiment of the invention can execute the model training method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example five
Fig. 6 is a schematic structural diagram of a layout analysis device according to a fifth embodiment of the present invention. As shown in fig. 6, the apparatus includes: a text detection box acquisition module 510, a second feature determination module 520 and a classification module 530,
the text detection box acquisition module 510 is configured to acquire a text image to be analyzed, and input the text image to be analyzed to the text detection module to obtain a text detection box;
a second feature determining module 520, configured to create a second graph to be analyzed according to the text detection box, and determine a target node feature and a target edge feature of the second graph to be analyzed; the target node in the second graph to be analyzed corresponds to the text detection box;
the classification module 530 is configured to input the target node feature and the target edge feature to a target graph neural network model, so as to obtain a target node type and a target edge type.
According to the technical scheme, the text image to be analyzed is acquired and is input to the text detection module to obtain the text detection box, so that a second graph to be analyzed is created according to the text detection box, the target node characteristics and the target edge characteristics of the second graph to be analyzed are determined, and then the target node characteristics and the target edge characteristics are input to the target graph neural network model to obtain the target node type and the target edge type. The target node type and the target edge type can reflect the layout of the text detection box in the text image to be analyzed, and the target node characteristics and the target edge characteristics can be determined through the target graph neural network model, so that the layout analysis effect can be improved, the constraint that the number of detection words is limited is avoided, the problems that the existing layout analysis method is poor in analysis effect and large in application limitation are solved, the layout analysis performance can be improved, and the application limitation is reduced.
Optionally, the layout analysis device further includes a layout analysis result determining module, configured to use a target node type of each target node as a layout element type matched with each target node; determining an instance affiliation matched with each target edge according to the target edge type of each target edge; and determining the layout analysis result of the text image to be analyzed according to the layout element type matched with each target node and the generic relationship of the instance matched with each target edge.
The layout analysis device provided by the embodiment of the invention can execute the layout analysis method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example six
Fig. 7 shows a schematic diagram of an electronic device that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 7, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the respective methods and processes described above, such as a model training method, or a layout analysis method.
In some embodiments, the model training method, or layout analysis method, may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the model training method described above, or the layout analysis method, may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the model training method, or layout analysis method, in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method of model training, comprising:
inputting the text image sample to a text detection module to obtain a text detection sample box;
creating a first graph to be analyzed according to the text detection sample box, and determining node characteristics and edge characteristics of the first graph to be analyzed; the nodes in the first graph to be analyzed correspond to the text detection sample box;
inputting the node characteristics and the edge characteristics into an original graph neural network model to perform classification training of node types and edge types, so as to obtain a target graph neural network model;
The target graph neural network model is used for carrying out layout analysis on texts in the text images.
2. The method of claim 1, wherein creating a first graph to be analyzed from the text detection sample box comprises:
acquiring a preset composition rule;
creating the first diagram to be analyzed according to the preset composition rule and the text detection sample box;
the preset composition rule comprises a full-connection rule, a first adjacent text box edge building rule or a second adjacent text box edge building rule.
3. The method of claim 1, wherein the determining node features and edge features of the first graph to be analyzed comprises:
using text box coordinates and text box image characteristics of the text detection sample boxes corresponding to the nodes as node characteristics of the first to-be-analyzed graph;
and taking the node characteristics of the connecting nodes of the edges, the relative distance between the coordinates of the text boxes matched with the edges, the length-width ratio of the text boxes matched with the edges and the prediction type of the corresponding nodes of the edges in the first graph to be analyzed as the edge characteristics of the first graph to be analyzed.
4. The method according to claim 1, wherein inputting the node features and the edge features into an original graph neural network model for classification training of node types and edge types comprises:
Inputting the node characteristics into the original graph neural network model, determining node updating characteristics of each node in the first graph to be analyzed, and performing node type training on the original graph neural network model according to the node updating characteristics of each node;
and according to the node updating characteristics of each node and the edge characteristics, carrying out edge type classification training on the original graph neural network model.
5. A layout analysis method, comprising:
acquiring a text image to be analyzed, and inputting the text image to be analyzed into a text detection module to obtain a text detection box;
creating a second graph to be analyzed according to the text detection box, and determining target node characteristics and target edge characteristics of the second graph to be analyzed; the target node in the second graph to be analyzed corresponds to the text detection box;
inputting the target node characteristics and the target edge characteristics into a target graph neural network model to obtain a target node type and a target edge type;
wherein the target graph neural network model is a model obtained by training the model training method according to any one of claims 1 to 4.
6. The method of claim 5, further comprising, after the obtaining the target node type and the target edge type:
taking the target node type of each target node as the layout element type matched with each target node;
determining an instance affiliation matched with each target edge according to the target edge type of each target edge;
and determining the layout analysis result of the text image to be analyzed according to the layout element type matched with each target node and the generic relationship of the instance matched with each target edge.
7. A model training device, comprising:
the text detection sample box acquisition module is used for inputting the text image sample into the text detection module to obtain a text detection sample box;
the first feature determining module is used for creating a first graph to be analyzed according to the text detection sample box and determining node features and edge features of the first graph to be analyzed; the nodes in the first graph to be analyzed correspond to the text detection sample box;
the target graph neural network model determining module is used for inputting the node characteristics and the edge characteristics into an original graph neural network model to perform classification training of node types and edge types, so as to obtain a target graph neural network model;
The target graph neural network model is used for carrying out layout analysis on texts in the text images.
8. A layout analysis apparatus, comprising:
the text detection box acquisition module is used for acquiring a text image to be analyzed, and inputting the text image to be analyzed into the text detection module to obtain a text detection box;
the second feature determining module is used for creating a second graph to be analyzed according to the text detection box and determining target node features and target edge features of the second graph to be analyzed; the target node in the second graph to be analyzed corresponds to the text detection box;
and the classification module is used for inputting the target node characteristics and the target edge characteristics into a target graph neural network model to obtain a target node type and a target edge type.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the model training method of any one of claims 1-4 or to perform the layout analysis method of any one of claims 5-6.
10. A computer readable storage medium storing computer instructions for causing a processor to implement the model training method of any one of claims 1-4 or the layout analysis method of any one of claims 5-6 when executed.
CN202310246347.3A 2023-03-14 2023-03-14 Layout analysis method, device, equipment and medium of graphic neural network Pending CN116229490A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310246347.3A CN116229490A (en) 2023-03-14 2023-03-14 Layout analysis method, device, equipment and medium of graphic neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310246347.3A CN116229490A (en) 2023-03-14 2023-03-14 Layout analysis method, device, equipment and medium of graphic neural network

Publications (1)

Publication Number Publication Date
CN116229490A true CN116229490A (en) 2023-06-06

Family

ID=86587229

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310246347.3A Pending CN116229490A (en) 2023-03-14 2023-03-14 Layout analysis method, device, equipment and medium of graphic neural network

Country Status (1)

Country Link
CN (1) CN116229490A (en)

Similar Documents

Publication Publication Date Title
US20220253631A1 (en) Image processing method, electronic device and storage medium
CN112966522A (en) Image classification method and device, electronic equipment and storage medium
US11861919B2 (en) Text recognition method and device, and electronic device
CN112949476B (en) Text relation detection method, device and storage medium based on graph convolution neural network
CN113657274B (en) Table generation method and device, electronic equipment and storage medium
CN112541332B (en) Form information extraction method and device, electronic equipment and storage medium
US20240304015A1 (en) Method of training deep learning model for text detection and text detection method
CN115546488B (en) Information segmentation method, information extraction method and training method of information segmentation model
CN115546809A (en) Table structure identification method based on cell constraint and application thereof
CN114429633A (en) Text recognition method, model training method, device, electronic equipment and medium
CN114724156B (en) Form identification method and device and electronic equipment
CN115331247A (en) Document structure identification method and device, electronic equipment and readable storage medium
CN114187448A (en) Document image recognition method and device, electronic equipment and computer readable medium
CN114092948A (en) Bill identification method, device, equipment and storage medium
CN114495101A (en) Text detection method, and training method and device of text detection network
CN117496521A (en) Method, system and device for extracting key information of table and readable storage medium
CN114120305B (en) Training method of text classification model, and text content recognition method and device
CN115797955A (en) Table structure identification method based on cell constraint and application thereof
CN116229490A (en) Layout analysis method, device, equipment and medium of graphic neural network
CN115601620A (en) Feature fusion method and device, electronic equipment and computer readable storage medium
JP4466241B2 (en) Document processing method and document processing apparatus
CN114299522B (en) Image recognition method device, apparatus and storage medium
CN114911963B (en) Template picture classification method, device, equipment, storage medium and product
CN115497112B (en) Form recognition method, form recognition device, form recognition equipment and storage medium
CN116363681A (en) Text type recognition method, device, equipment and medium based on document image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination