CN112949476B - Text relation detection method, device and storage medium based on graph convolution neural network - Google Patents

Text relation detection method, device and storage medium based on graph convolution neural network Download PDF

Info

Publication number
CN112949476B
CN112949476B CN202110224515.XA CN202110224515A CN112949476B CN 112949476 B CN112949476 B CN 112949476B CN 202110224515 A CN202110224515 A CN 202110224515A CN 112949476 B CN112949476 B CN 112949476B
Authority
CN
China
Prior art keywords
key information
text
edge
block
blocks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110224515.XA
Other languages
Chinese (zh)
Other versions
CN112949476A (en
Inventor
熊玉竹
侯绍东
周以晴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Meinenghua Intelligent Technology Co ltd
Original Assignee
Suzhou Meinenghua Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Meinenghua Intelligent Technology Co ltd filed Critical Suzhou Meinenghua Intelligent Technology Co ltd
Priority to CN202110224515.XA priority Critical patent/CN112949476B/en
Publication of CN112949476A publication Critical patent/CN112949476A/en
Application granted granted Critical
Publication of CN112949476B publication Critical patent/CN112949476B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a text relation detection method, a device and a storage medium based on a graph convolution neural network, belonging to the technical field of computers, wherein the method comprises the following steps: acquiring a plurality of key information blocks of text information in a target image, wherein each text block in the key information blocks comprises at least one character string; inputting the character strings of each text block in each key information block into a node characteristic extraction model to obtain node characteristics of the key information blocks; constructing a connection relation between each text block in the key information blocks and each text block in other key information blocks; determining edge characteristics of the key information blocks based on the communication relations corresponding to the key information blocks and the position information corresponding to the communication relations; inputting node characteristics and edge characteristics into a pre-trained graph convolution neural network to obtain edge types among key information blocks; determining that key information blocks with the same edge type have an association relation; the accuracy and efficiency of the association relation identification can be improved.

Description

Text relation detection method, device and storage medium based on graph convolution neural network
[ field of technology ]
The application relates to a text relation detection method, a device and a storage medium based on a graph convolution neural network, and belongs to the technical field of computers.
[ background Art ]
Text relationship detection is a common need in the field of natural language processing. Briefly, text relationship detection is a classification of an entity of interest contained in a document by relationship type, for example: the key information and the association relation between the information are extracted from the documents such as bills, logistics sheets and the like, and the document information can be structured, so that the work efficiency of practitioners is improved.
Starting from original files such as bills, logistics sheets and the like, firstly, identifying character characters and character position information by using optical character recognition (Optical Character Recognition, OCR), and acquiring text block nodes by aggregating characters according to a distance threshold. Then, the document information can be structured through two links, namely, a first link is to use a document information extraction model to extract key information blocks by taking the aggregated text block nodes as input, and a second link is to use the key information blocks as input to detect the relation between key information files.
After the key information blocks are extracted, the most common method at present is to write a logic rule based on the position information of the key information blocks, and judge the association relationship between the key information blocks by using a manually set distance threshold value in the transverse and longitudinal directions. And the other way is to distinguish key and value from each other by key information blocks, and detect the association relation of key-value pairs by constructing a deep learning model to obtain the association relation between key information blocks.
However, the method for judging the association relationship between the key information blocks by using the logic rules based on the position information is rough, the selection of the distance threshold depends on manual experience and is strongly related to the sample, the proper thresholds of different files are different, and the situation that the detection result of part of file relationship is unreasonable exists, such as files which do not accord with the normal page typesetting format.
The information in the normal file appears according to the rows, the division exists between the rows, and the text in the same row can be considered to have the simplest association relationship, namely row association. However, a document that does not conform to the normal page layout may have a portion of text that is too long or offset, resulting in its text appearing in other lines, and the text may be misclassified into other lines based on a distance threshold, resulting in an association detection error. In the document file, the format requirements on page typesetting are not strict enough in many cases, a certain proportion of files have no table lines, no obvious separation exists between rows and columns, and the case of staggered rows and staggered columns is common.
The other method is to distinguish the key and the value of the key information block and then detect the key value pair relation, wherein the currently known relation detection types are two types of matching and non-matching, and in the actual service use scene, the relation types among texts are various, and only whether the matching relation detection is not flexible enough can not meet the actual use. Since this approach increases constraint limits on relationships between key-values, it is not possible to support the detection of text relationships with all values in a row or column.
[ application ]
The application provides a text relation detection method, a text relation detection device and a storage medium based on a graph convolution neural network, which can solve the problems that the detection result is unreasonable and the detection mode is inflexible due to the fact that the association relation is determined by setting a logic rule based on position information. The application provides the following technical scheme:
in a first aspect, a text relationship detection method based on a graph convolution neural network is provided, the method comprising:
acquiring a plurality of key information blocks of text information in a target image, wherein the key information blocks comprise a plurality of text blocks, and each text block comprises at least one character string;
inputting the character strings of each text block in each key information block into a node characteristic extraction model to obtain node characteristics of the key information blocks;
for each key information block in the plurality of key information blocks, constructing a communication relation between each text block in the key information blocks and each text block in other key information blocks;
determining the edge characteristics of each key information block based on each communication relation corresponding to each key information block and the position information corresponding to each communication relation;
inputting the node characteristics and the edge characteristics into a pre-trained graph convolution neural network to obtain edge types among key information blocks;
and determining that the key information blocks with the same edge type have an association relationship.
Optionally, the determining the edge feature of each key information block based on the respective connected relation corresponding to each key information block and the position information corresponding to each connected relation includes:
for each communication relation, determining a sub-edge feature corresponding to the communication relation according to the relative position between two text blocks communicated by the communication relation;
for each key information block, determining that each text block in the key information block is communicated with a corresponding sub-edge feature;
and generating the edge characteristics based on the sub-edge characteristics corresponding to each key information block.
Optionally, for each connected relation, determining the sub-edge feature corresponding to the connected relation according to the relative position between two text blocks connected by the connected relation, including:
discretizing the relative position according to the direction and the distance to obtain a direction code and a distance code;
inputting the direction code and the distance code into an embedded model to obtain a direction embedded code, a horizontal distance embedded code and a vertical distance embedded code;
and splicing the direction embedded code, the horizontal distance embedded code and the vertical distance embedded code, and then projecting to obtain a vector with a fixed length, thereby obtaining the sub-edge feature.
Optionally, the generating the edge feature based on the sub-edge feature corresponding to each key information block includes:
processing each sub-edge feature into the same dimension;
and processing each processed sub-edge feature into a first fixed dimension to obtain the edge feature.
Optionally, the generating the edge feature based on the sub-edge feature corresponding to each key information block includes:
for each key information block, acquiring a communication relation matching table formed by the communication relations of the key information blocks;
and searching a corresponding sub-edge feature vector set from a vector table formed by the sub-edge features according to the communication relation matching table to obtain the edge features of the key information block.
Optionally, the inputting the character string of each text block in each key information block into the node feature extraction model to obtain the node feature of the key information block includes:
for each text block, inputting character strings in the text block into a pre-trained cyclic neural network (RNN) to obtain a feature vector of each character string;
and processing the feature vector of each character string into a second fixed dimension to obtain the node feature.
Optionally, the inputting the node feature and the edge feature into a pre-trained graph convolution neural network to obtain an edge type between each key information block includes:
for each key information block, calculating target node information through the graph convolution neural network by using node characteristics of the key information block and node characteristics and edge characteristics which have a communication relation with the key information block;
and splicing information of each node associated with the edge, and calculating the attribute of the edge through a multi-layer forward network to obtain the edge type.
In a second aspect, there is provided a text relationship detection apparatus based on a graph roll-up neural network, the apparatus comprising:
the key information acquisition module is used for acquiring a plurality of key information blocks of text information in a target image, wherein the key information blocks comprise a plurality of text blocks, and each text block comprises at least one character string;
the node characteristic extraction module is used for inputting the character strings of each text block in each key information block into the node characteristic extraction model to obtain the node characteristics of the key information block;
the communication relation construction module is used for constructing a communication relation between each text block in the key information blocks and each text block in other key information blocks for each key information block in the plurality of key information blocks;
the edge feature extraction module is used for determining edge features of the key information blocks based on the communication relations corresponding to the key information blocks and the position information corresponding to the communication relations;
the edge type calculation module is used for inputting the node characteristics and the edge characteristics into a pre-trained graph convolution neural network to obtain edge types among the key information blocks;
and the association relation determining module is used for determining that the key information blocks with the same edge type have association relation.
In a third aspect, a text relationship detection apparatus based on a graph convolutional neural network is provided, the apparatus comprising a processor and a memory; the memory stores a program that is loaded and executed by the processor to implement the graph roll-up neural network-based text relationship detection method of the first aspect.
In a fourth aspect, a computer readable storage medium is provided, in which a program is stored, and the program is loaded and executed by the processor to implement the text relationship detection method based on a graph convolution neural network according to the first aspect.
The application has the beneficial effects that: acquiring a plurality of key information blocks of text information in a target image, wherein the key information blocks comprise a plurality of text blocks, and each text block comprises at least one character string; inputting the character strings of each text block in each key information block into a node characteristic extraction model to obtain node characteristics of the key information blocks; for each key information block in the plurality of key information blocks, constructing a connection relation between each text block in the key information blocks and each text block in other key information blocks; determining edge characteristics of the key information blocks based on the communication relations corresponding to the key information blocks and the position information corresponding to the communication relations; inputting node characteristics and edge characteristics into a pre-trained graph convolution neural network to obtain edge types among key information blocks; determining that key information blocks with the same edge type have an association relation; the problem that the detection result is unreasonable due to the fact that the association relation is determined by setting the logic rule based on the position information can be solved, and accuracy of association relation identification is improved. Meanwhile, the relation detection method provided by the application does not need to distinguish keys and values of the key information blocks, and can directly detect the association relation, so that the efficiency of the association relation detection can be improved.
The foregoing description is only an overview of the present application, and is intended to provide a better understanding of the present application, as it is embodied in the following description, with reference to the preferred embodiments of the present application and the accompanying drawings.
[ description of the drawings ]
FIG. 1 is a flow chart of a text relationship detection method based on a graph roll-up neural network according to one embodiment of the present application;
FIG. 2 is a flowchart of a text relationship detection method based on a graph roll-up neural network according to another embodiment of the present application;
FIG. 3 is a block diagram of a text relationship detection apparatus based on a graph convolutional neural network provided in one embodiment of the present application;
fig. 4 is a block diagram of a text relationship detection apparatus based on a graph convolutional neural network according to an embodiment of the present application.
[ detailed description ] of the application
The following describes in further detail the embodiments of the present application with reference to the drawings and examples. The following examples are illustrative of the application and are not intended to limit the scope of the application.
First, several terms related to the present application will be described.
Optical character recognition (Optical Character Recognition, OCR): is a recognition technology for converting information in an image into characters.
Text block node: the text block segmented by a certain threshold value comprises text content, text position and related picture background.
Key information block: consists of a set of text block nodes of the existing type, which are valuable information in the document, such as price, weight, etc.
Node characteristics: and the characteristic information of the key information block is obtained by encoding from the text content and the type of the key information block.
Edge characteristics: refers to the characteristic information of the edges connected between the key information blocks.
Sub-edge features: the characteristic information of the edges connected among the text block nodes is obtained by encoding from the position information of the nodes.
Information extraction model: the text block node is taken as input to extract a model of a key information block in a text file, and the graph rolling network is a core component of the graph rolling network.
Recurrent neural network (Recurrent Neural Network, RNN): is a special neural network structure, which consists of an input layer, a hidden layer and an output layer.
Surarray model: an artificially designed neural network structure takes a group of feature vectors as input and outputs a vector with fixed dimension to represent semantic information of the group of feature vectors.
Graph convolutional neural network (Graph Convolutional Network, GCN): the neural network is applied to the Graph, and the neural network adopting Graph convolution can be applied to Graph Embedding (GE) Network Embedding.
Graph g= (V, E), V is the set of nodes, E is the set of edges, for each node i there is its characteristic x i Can use matrix X N*D And (3) representing. Where N represents the number of nodes and D represents the number of features per node, or the dimension of the feature vector.
Graph convolution refers to the process of determining a characteristic representation of a current node by surrounding nodes of the current node. The surrounding nodes can be neighbor nodes of the current node, namely nodes (or called nodes) with association relation with the current node, or neighbor nodes of the current node, and the like, and the type of the surrounding nodes is not limited by the application.
The convolution can be represented by the following nonlinear function:
H l+1 =f(H l ,A)
wherein H is 0 X is the input of the first layer, X ε R N*D N is the number of nodes of the graph, D is the dimension of each node characteristic vector, A is an adjacent matrix, and the functions f of different graph convolution neural networks are the same or different.
Optionally, the present application is described by taking the execution body of each embodiment as an electronic device with computing capability as an example, where the electronic device may be a terminal or a server, and the terminal may be a computer, a notebook computer, a tablet computer, etc., and the embodiment does not limit the type of the terminal and the type of the electronic device.
The text relation detection method provided by the embodiment is suitable for identifying the association relation between key information blocks in the text file. Each key information block corresponds to a named entity, for example, the bill file comprises four key information blocks, which are respectively: the number of each key information block is a plurality of the original country of the product, the number of the product, the unit price of the product and the total price of the product, and the key information blocks with association relations (such as belonging to the same product) can be detected in the embodiment.
In practical application, the key information block may be a key information block in various documents such as value-added tax receipts, insurance policies, etc., or may be a key information block in other types of files, such as a key information block in a document image, etc., and the embodiment does not limit the types of the key information blocks.
Based on this, if the key information blocks include a plurality of types and the number of each key information block is a plurality of types, how to accurately identify the key information blocks having the association relationship becomes a problem to be solved. Based on the above, the text relation detection scheme provided by the application takes key information blocks in text files (such as document files, certificate files and the like) as input, wherein the key information blocks consist of a group of text block nodes with existing types; constructing an undirected graph by taking key information blocks as nodes and the connection condition among the key information blocks as edges, and coding learning features and predicting the types of edge connection relations by using a graph neural network according to the undirected graph; the node and the edge of the input graph neural network need to extract corresponding characteristic information from the key information blocks, the node characteristics are extracted from the key information blocks, and the edge characteristics are extracted from the edge communication relations among the key information blocks; inputting the node characteristics and the edge characteristics into a constructed graph neural network architecture to learn edge relation characteristic vectors in a coding way; and finally, predicting the edge type according to the edge relation feature vector and aggregating to obtain the association relation among the key information blocks.
Because the association relation can be automatically detected according to the characteristics of the key information blocks and the edge characteristics, the problem that the detection result is unreasonable due to the fact that the association relation is determined by setting the logic rule based on the position information can be solved, and the accuracy of the association relation identification is improved.
In addition, the relation detection method provided by the application does not need to distinguish the key and the value of the key information block, and can directly detect the association relation, so that the efficiency of the association relation detection can be improved.
The text relation detection method based on the graph convolution neural network provided by the application is described in detail below.
Fig. 1 is a flowchart of a text relationship detection method based on a graph roll-up neural network according to an embodiment of the present application. The method at least comprises the following steps:
step 101, acquiring a plurality of key information blocks of text information in a target image, wherein the key information blocks comprise a plurality of text blocks, and each text block comprises at least one character string.
The key information block is information to be extracted from text information of the target image, or information focused by a user.
Optionally, the key information blocks may be identified by the electronic device, for example, using an information extraction model stored in the electronic device, and using text block nodes as input to extract a model of the key information blocks in the text file; or sent by other devices, the embodiment does not limit the acquisition mode of the key information block.
In this embodiment, the key information block is composed of a set of text blocks of the presence type; the text block contains two kinds of information, one is the text content of the text block and the other is the position information of the text block.
Optionally, the location information is a text box logo formed by the upper left and lower right corners of the text block. Alternatively, the location information may be each pixel location corresponding to the text block, and the implementation of the location information is not limited in this embodiment.
Optionally, the electronic device may also perform data format conversion on the key information block, so that the converted data format is suitable for subsequent steps. Data format conversion includes, but is not limited to: the specific content of the data format conversion is not limited in this embodiment, such as the association index between the key information block and the text block node, and the formatting of the text block node position coordinates.
And 102, inputting the character strings of each text block in each key information block into a node characteristic extraction model to obtain the node characteristics of the key information blocks.
In this embodiment, the node characteristics of the key information blocks are extracted and obtained using the type information and a corresponding set of text blocks.
In one example, inputting the character string of each text block in each key information block into the node feature extraction model to obtain the node feature of the key information block includes: for each text block, inputting character strings in the text block into a pre-trained RNN to obtain a feature vector of each character string; and processing the feature vector of each character string into a second fixed dimension to obtain node features.
Specifically, for a group of character strings corresponding to text blocks of an input node feature extraction model, firstly, the character strings are subjected to character-by-character vectorization to obtain character vectorization representation, then, character vectors are coded through RNN, and then, the character vectors are weighted and overlapped into feature vectors with fixed dimensions by using a Summary model, so that the feature vectors represent the text features of the text block nodes.
The above examples are described by taking the example that the node feature extraction model includes RNN and Summary model, and in actual implementation, the node feature extraction model may be other types of neural networks, such as: the feature vector of the character string is obtained through calculation of the linear regression model, or is calculated by using word2vector, and the implementation mode of the node feature extraction model is not limited in the embodiment.
Optionally, the electronic device randomly initializes a vector table of node types in advance, searches a vector in the vector table through index subscripts corresponding to the node types of the key information blocks, and characterizes the type characteristics of the key information blocks by the vector. And then expanding the type features into the same dimension as the text features and splicing the dimension with the text features to form a group of new feature vectors, extracting the feature vectors by using RNN and Summary model codes to obtain a feature vector with a fixed dimension, and representing the node features of the key information blocks by the extracted feature vectors.
Step 103, for each of the plurality of key information blocks, constructing a connected relation between each text block in the key information blocks and each text block in other key information blocks.
The key information block is composed of a set of text blocks, and a plurality of text position information exists. One strategy is to construct a location area large enough to cover all text block nodes, but this location area is not described precisely enough and may overlap with the location area of other key information nodes. Based on this, in this embodiment, the connected relation between two key information blocks is defined as the sum of the connected relation between two groups of text block nodes. That is, the first key information block contains M text blocks, the second key information block contains N text blocks, and then m×n text block connectivity relations exist between the two key information blocks, and the sum of these connectivity relations is the connectivity relation between the two key information nodes, where M and N are positive integers greater than or equal to 1.
Step 104, determining the edge characteristics of the key information blocks based on the connected relations corresponding to the key information blocks and the position information corresponding to the connected relations.
Alternatively, steps 103 and 104 may be performed after step 102, or may be performed before step 102, or may be performed simultaneously with step 102, and the order of execution between steps 103 and 104 and step 102 is not limited in this embodiment.
In one example, determining the edge feature of the key information block based on the respective connectivity relationships for each key information block and the location information for each connectivity relationship includes: for each communication relation, determining the sub-edge characteristics corresponding to the communication relation according to the relative position between two text blocks communicated by the communication relation; for each key information block, determining that each text block in the key information block is communicated with a corresponding sub-edge feature; and generating edge features based on the sub-edge features corresponding to each key information block.
For each connected relation, determining the sub-edge feature corresponding to the connected relation according to the relative position between two text blocks connected by the connected relation, including: discretizing the relative position according to the direction and the distance to obtain a direction code and a distance code; inputting the direction code and the distance code into an embedded model to obtain a direction embedded code, a horizontal distance embedded code and a vertical distance embedded code; and splicing the direction embedded code, the horizontal distance embedded code and the vertical distance embedded code, and then projecting to obtain a vector with a fixed length, thereby obtaining the sub-edge characteristics.
The embedding model may be a pre-trained embellishing layer.
Such as: for the relative positions (center line vectors) of two text blocks having a connected relationship, discretization is performed in terms of direction and distance, and the discretization refers to: the direction is divided into a plurality of directions (such as 360 directions, the adjacent two directions differ by 1 degree) according to angles, the vertical distance and the horizontal distance are divided by the length and the width of the target image on the distance, the normalized vertical distance and the normalized horizontal distance are obtained, and then the normalized vertical distance and the normalized horizontal distance are multiplied by 1000 and are rounded. Thus, an integer code of the direction and an integer code of the distance are obtained. And calculating corresponding embedded codes of the direction, the horizontal and the vertical integer codes through an embedded model to obtain three embedded codes of the direction, the horizontal and the vertical. The embedded codes are spliced and projected onto a fixed length vector as edge features of the directed graph.
Optionally, in two steps in the edge feature extraction process, sub-edge features between text blocks are extracted first, and then edge features between key information blocks are searched from a vector table of the sub-edge features and obtained through coding extraction.
The method for obtaining the edge features by searching, encoding and extracting from the vector table of the sub-edge features comprises the following steps: for each key information block, acquiring a communication relation matching table formed by the communication relations of the key information blocks; and searching a corresponding sub-edge feature vector set from a vector table formed by the sub-edge features according to the connected relation matching table to obtain the edge features of the key information block.
Optionally, after obtaining the sub-edge feature vector set, the electronic device processes each sub-edge feature into the same dimension; and processing each processed sub-edge feature into a first fixed dimension to obtain an edge feature.
An example of determining edge characteristics of a key information block is described below:
1) Calculating the sub-edge feature vectors among the text blocks, and constructing a vector table consisting of the sub-edge feature vectors from the sub-edge communication among the text blocks.
2) Searching a sub-edge communication condition matching table among key information blocks in a sub-edge characteristic vector table to obtain vectorization representation;
3) Expanding the sub-edge feature set among the vectorized key information blocks into the same dimension, and processing the same into a feature vector with a first fixed dimension through a Summary model, so that the vector represents the edge feature of the key information block.
And 105, inputting the node characteristics and the edge characteristics into a pre-trained graph convolution neural network to obtain the edge types among the key information blocks.
The input of the graph convolutional neural network is node characteristics and edge characteristics, and the process for calculating the edge type comprises the following steps: for each key information block, calculating target node information by using a graph convolutional neural network through node characteristics of the key information block and node characteristics and edge characteristics which have a communication relation with the key information block; and splicing information of each node associated with the edge, and calculating the attribute of the edge through a multi-layer forward network to obtain the edge type.
The process of calculating the target node information and calculating the attributes of the edges is repeated for a plurality of times, so that the field of view is enlarged and high-level semantic features are obtained.
And step 106, determining that the key information blocks with the same edge type have an association relationship.
The key information blocks are aggregated together according to the connection of the same edge type to form a group, and the group represents that certain association relation exists among the key information blocks, wherein the association relation is the detected edge type. And aggregating the key information blocks and the edge type prediction results into a group of node relation sets with types, wherein the relation sets are structured information extraction results, namely the relation among texts to be detected.
In order to more clearly understand the text relationship detection method based on the graph convolution neural network provided by the present application, referring to fig. 2, an example of the text relationship detection method is described below. The key information blocks in the document are illustrated in fig. 2 as inputs. Extracting text characteristics from text contents in the key information blocks, and encoding the type characteristics of the key information blocks; combining and encoding the text characteristics and the type characteristics of the key information blocks to obtain node characteristics; disassembling all key information blocks to obtain a candidate text block set, extracting sub-edge features between every two text blocks according to the position information of the text blocks, and constructing a sub-edge feature vector table by using the sub-edge features; regarding the edge characteristics of every two key information blocks as aggregation of sub-edge characteristics among two groups of corresponding text blocks, namely, each edge characteristic consists of a group of sub-edge characteristics, searching and acquiring the sub-edge characteristic vector table through index indexes corresponding to the group of sub-edges, and after acquiring a group of sub-edge characteristic vectors corresponding to the edge characteristics, encoding and extracting to obtain the edge characteristics among the key information blocks; coding the key information block characteristics and the edge characteristics among the key information blocks into characteristic vectors of learning edge relations through a constructed graphic neural network layer; and detecting the type of edge connection among the key information blocks by using the feature vectors of the edge relation, and aggregating according to the edge type to obtain a group of key information block sets with the type.
Alternatively, steps 101-105 may be implemented in the same network model, where the input of the network model is the critical information block and the output is the edge type.
In summary, according to the text relationship detection method based on the graph convolution neural network provided by the embodiment, by acquiring a plurality of key information blocks of text information in a target image, the key information blocks include a plurality of text blocks, and each text block includes at least one character string; inputting the character strings of each text block in each key information block into a node characteristic extraction model to obtain node characteristics of the key information blocks; for each key information block in the plurality of key information blocks, constructing a connection relation between each text block in the key information blocks and each text block in other key information blocks; determining edge characteristics of the key information blocks based on the communication relations corresponding to the key information blocks and the position information corresponding to the communication relations; inputting node characteristics and edge characteristics into a pre-trained graph convolution neural network to obtain edge types among key information blocks; determining that key information blocks with the same edge type have an association relation; the problem that the detection result is unreasonable due to the fact that the association relation is determined by setting the logic rule based on the position information can be solved, and accuracy of association relation identification is improved. Meanwhile, the relation detection method provided by the application does not need to distinguish keys and values of the key information blocks, and can directly detect the association relation, so that the efficiency of the association relation detection can be improved.
Fig. 3 is a block diagram of a text relationship detection apparatus based on a graph convolutional neural network according to an embodiment of the present application. The device at least comprises the following modules: the system comprises a key information acquisition module 310, a node feature extraction module 320, a connectivity relation construction module 330, an edge feature extraction module 340, an edge type calculation module 350 and an association relation determination module 360.
A key information obtaining module 310, configured to obtain a plurality of key information blocks of text information in a target image, where the key information blocks include a plurality of text blocks, and each text block includes at least one character string;
the node feature extraction module 320 is configured to input a character string of each text block in each key information block into a node feature extraction model to obtain node features of the key information block;
a connectivity relation construction module 330, configured to construct, for each of the plurality of key information blocks, a connectivity relation between each text block in the key information block and each text block in other key information blocks;
an edge feature extraction module 340, configured to determine an edge feature of each key information block based on each connected relation corresponding to the key information block and position information corresponding to each connected relation;
the edge type calculation module 350 is configured to input the node feature and the edge feature into a pre-trained graph convolution neural network to obtain an edge type between each key information block;
the association determination module 360 is configured to determine that the key information blocks with the same edge type have an association.
For relevant details reference is made to the method embodiments described above.
It should be noted that: in the text relationship detection device based on the graph roll-up neural network provided in the above embodiment, when the text relationship detection based on the graph roll-up neural network is performed, only the division of the functional modules is used for illustration, in practical application, the functional allocation can be completed by different functional modules according to needs, that is, the internal structure of the text relationship detection device based on the graph roll-up neural network is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the text relationship detection device based on the graph convolution neural network provided in the above embodiment and the text relationship detection method embodiment based on the graph convolution neural network belong to the same concept, and detailed implementation processes of the text relationship detection device and the text relationship detection method embodiment based on the graph convolution neural network are detailed in the method embodiment and are not repeated herein.
Fig. 4 is a block diagram of a text relationship detection apparatus based on a graph convolutional neural network according to an embodiment of the present application. The apparatus comprises at least a processor 401 and a memory 402.
Processor 401 may include one or more processing cores such as: 4 core processors, 8 core processors, etc. The processor 401 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 401 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 401 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 401 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.
Memory 402 may include one or more computer-readable storage media, which may be non-transitory. Memory 402 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 402 is used to store at least one instruction for execution by processor 401 to implement the method for graph-convolution neural network-based text relationship detection provided by the method embodiments of the present application.
In some embodiments, the text relation detection device based on the graph roll-up neural network further optionally comprises: a peripheral interface and at least one peripheral. The processor 401, memory 402, and peripheral interfaces may be connected by buses or signal lines. The individual peripheral devices may be connected to the peripheral device interface via buses, signal lines or circuit boards. Illustratively, peripheral devices include, but are not limited to: radio frequency circuitry, touch display screens, audio circuitry, and power supplies, among others.
Of course, the text relationship detecting device based on the graph convolution neural network may further include fewer or more components, which is not limited in this embodiment.
Optionally, the present application further provides a computer readable storage medium, where a program is stored, where the program is loaded and executed by a processor to implement the text relation detection method based on the graph convolutional neural network in the above method embodiment.
Optionally, the present application further provides a computer product, where the computer product includes a computer readable storage medium, where a program is stored, where the program is loaded and executed by a processor to implement the text relation detection method based on the graph convolutional neural network in the above method embodiment.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.
The foregoing is merely one specific embodiment of the application, and any modifications made in light of the above teachings are intended to fall within the scope of the application.

Claims (8)

1. A text relationship detection method based on a graph convolution neural network, the method comprising:
acquiring a plurality of key information blocks of text information in a target image, wherein the key information blocks comprise a plurality of text blocks, and each text block comprises at least one character string;
inputting the character strings of each text block in each key information block into a node characteristic extraction model to obtain node characteristics of the key information blocks;
for each key information block in the plurality of key information blocks, constructing a communication relation between each text block in the key information blocks and each text block in other key information blocks;
determining the edge characteristics of each key information block based on each communication relation corresponding to each key information block and the position information corresponding to each communication relation; specifically, the method comprises the following steps: for each communication relation, determining a sub-edge feature corresponding to the communication relation according to the relative position between two text blocks communicated by the communication relation; for each key information block, determining that each text block in the key information block is communicated with a corresponding sub-edge feature; generating the edge features based on the sub-edge features corresponding to each key information block;
for each communication relation, determining the sub-edge feature corresponding to the communication relation according to the relative position between two text blocks communicated by the communication relation, including: discretizing the relative position according to the direction and the distance to obtain a direction code and a distance code; inputting the direction code and the distance code into an embedded model to obtain a direction embedded code, a horizontal distance embedded code and a vertical distance embedded code; splicing the direction embedded code, the horizontal distance embedded code and the vertical distance embedded code, and projecting to obtain a vector with a fixed length to obtain the sub-edge feature;
inputting the node characteristics and the edge characteristics into a pre-trained graph convolution neural network to obtain edge types among key information blocks;
and determining that the key information blocks with the same edge type have an association relationship.
2. The method of claim 1, wherein generating the edge feature based on the sub-edge feature corresponding to each key information block comprises:
processing each sub-edge feature into the same dimension;
and processing each processed sub-edge feature into a first fixed dimension to obtain the edge feature.
3. The method of claim 1, wherein generating the edge feature based on the sub-edge feature corresponding to each key information block comprises:
for each key information block, acquiring a communication relation matching table formed by the communication relations of the key information blocks;
and searching a corresponding sub-edge feature vector set from a vector table formed by the sub-edge features according to the communication relation matching table to obtain the edge features of the key information block.
4. The method according to claim 1, wherein inputting the character string of each text block in each key information block into the node feature extraction model, obtaining the node feature of the key information block, comprises:
for each text block, inputting character strings in the text block into a pre-trained cyclic neural network (RNN) to obtain a feature vector of each character string;
and processing the feature vector of each character string into a second fixed dimension to obtain the node feature.
5. The method of claim 1, wherein said inputting the node features and the edge features into a pre-trained graph convolution neural network results in edge types between key information blocks, comprising:
for each key information block, calculating target node information through the graph convolution neural network by using node characteristics of the key information block and node characteristics and edge characteristics which have a communication relation with the key information block;
and splicing information of each node associated with the edge, and calculating the attribute of the edge through a multi-layer forward network to obtain the edge type.
6. A text relationship detection apparatus based on a graph convolutional neural network, the apparatus comprising:
the key information acquisition module is used for acquiring a plurality of key information blocks of text information in a target image, wherein the key information blocks comprise a plurality of text blocks, and each text block comprises at least one character string;
the node characteristic extraction module is used for inputting the character strings of each text block in each key information block into the node characteristic extraction model to obtain the node characteristics of the key information block;
the communication relation construction module is used for constructing a communication relation between each text block in the key information blocks and each text block in other key information blocks for each key information block in the plurality of key information blocks;
the edge feature extraction module is used for determining edge features of the key information blocks based on the communication relations corresponding to the key information blocks and the position information corresponding to the communication relations; specifically, the method comprises the following steps: for each communication relation, determining a sub-edge feature corresponding to the communication relation according to the relative position between two text blocks communicated by the communication relation; for each key information block, determining that each text block in the key information block is communicated with a corresponding sub-edge feature; generating the edge features based on the sub-edge features corresponding to each key information block;
for each communication relation, determining the sub-edge feature corresponding to the communication relation according to the relative position between two text blocks communicated by the communication relation, including: discretizing the relative position according to the direction and the distance to obtain a direction code and a distance code; inputting the direction code and the distance code into an embedded model to obtain a direction embedded code, a horizontal distance embedded code and a vertical distance embedded code; splicing the direction embedded code, the horizontal distance embedded code and the vertical distance embedded code, and projecting to obtain a vector with a fixed length to obtain the sub-edge feature;
the edge type calculation module is used for inputting the node characteristics and the edge characteristics into a pre-trained graph convolution neural network to obtain edge types among the key information blocks;
and the association relation determining module is used for determining that the key information blocks with the same edge type have association relation.
7. A text relation detection device based on a graph convolution neural network, which is characterized by comprising a processor and a memory; stored in the memory is a program that is loaded and executed by the processor to implement the graph roll-up neural network based text relationship detection method of any one of claims 1 to 5.
8. A computer-readable storage medium, wherein a program is stored in the storage medium, which when executed by a processor is configured to implement the graph-convolution neural network-based text relationship detection method according to any one of claims 1 to 5.
CN202110224515.XA 2021-03-01 2021-03-01 Text relation detection method, device and storage medium based on graph convolution neural network Active CN112949476B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110224515.XA CN112949476B (en) 2021-03-01 2021-03-01 Text relation detection method, device and storage medium based on graph convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110224515.XA CN112949476B (en) 2021-03-01 2021-03-01 Text relation detection method, device and storage medium based on graph convolution neural network

Publications (2)

Publication Number Publication Date
CN112949476A CN112949476A (en) 2021-06-11
CN112949476B true CN112949476B (en) 2023-09-29

Family

ID=76246856

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110224515.XA Active CN112949476B (en) 2021-03-01 2021-03-01 Text relation detection method, device and storage medium based on graph convolution neural network

Country Status (1)

Country Link
CN (1) CN112949476B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037985A (en) * 2021-11-04 2022-02-11 北京有竹居网络技术有限公司 Information extraction method, device, equipment, medium and product
CN114283403B (en) * 2021-12-24 2024-01-16 北京有竹居网络技术有限公司 Image detection method, device, storage medium and equipment
CN114219876B (en) * 2022-02-18 2022-06-24 阿里巴巴达摩院(杭州)科技有限公司 Text merging method, device, equipment and storage medium
CN115116060B (en) * 2022-08-25 2023-01-24 深圳前海环融联易信息科技服务有限公司 Key value file processing method, device, equipment and medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2185827A1 (en) * 1991-12-23 1993-06-24 Chinmoy Bhusan Bose Method and Apparatus for Connected and Degraded Text Recognition
CN109062874A (en) * 2018-06-12 2018-12-21 平安科技(深圳)有限公司 Acquisition methods, terminal device and the medium of financial data
CN110825845A (en) * 2019-10-23 2020-02-21 中南大学 Hierarchical text classification method based on character and self-attention mechanism and Chinese text classification method
CN111553837A (en) * 2020-04-28 2020-08-18 武汉理工大学 Artistic text image generation method based on neural style migration
CN111581377A (en) * 2020-04-23 2020-08-25 广东博智林机器人有限公司 Text classification method and device, storage medium and computer equipment
CN111597943A (en) * 2020-05-08 2020-08-28 杭州火石数智科技有限公司 Table structure identification method based on graph neural network
CN111652162A (en) * 2020-06-08 2020-09-11 成都知识视觉科技有限公司 Text detection and identification method for medical document structured knowledge extraction
CN111784802A (en) * 2020-07-30 2020-10-16 支付宝(杭州)信息技术有限公司 Image generation method, device and equipment
CN111860257A (en) * 2020-07-10 2020-10-30 上海交通大学 Table identification method and system fusing multiple text features and geometric information
CN111967387A (en) * 2020-08-17 2020-11-20 北京市商汤科技开发有限公司 Form recognition method, device, equipment and computer readable storage medium
CN112215236A (en) * 2020-10-21 2021-01-12 科大讯飞股份有限公司 Text recognition method and device, electronic equipment and storage medium
CN112241481A (en) * 2020-10-09 2021-01-19 中国人民解放军国防科技大学 Cross-modal news event classification method and system based on graph neural network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10839291B2 (en) * 2017-07-01 2020-11-17 Intel Corporation Hardened deep neural networks through training from adversarial misclassified data
KR102535411B1 (en) * 2017-11-16 2023-05-23 삼성전자주식회사 Apparatus and method related to metric learning based data classification

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2185827A1 (en) * 1991-12-23 1993-06-24 Chinmoy Bhusan Bose Method and Apparatus for Connected and Degraded Text Recognition
CN109062874A (en) * 2018-06-12 2018-12-21 平安科技(深圳)有限公司 Acquisition methods, terminal device and the medium of financial data
CN110825845A (en) * 2019-10-23 2020-02-21 中南大学 Hierarchical text classification method based on character and self-attention mechanism and Chinese text classification method
CN111581377A (en) * 2020-04-23 2020-08-25 广东博智林机器人有限公司 Text classification method and device, storage medium and computer equipment
CN111553837A (en) * 2020-04-28 2020-08-18 武汉理工大学 Artistic text image generation method based on neural style migration
CN111597943A (en) * 2020-05-08 2020-08-28 杭州火石数智科技有限公司 Table structure identification method based on graph neural network
CN111652162A (en) * 2020-06-08 2020-09-11 成都知识视觉科技有限公司 Text detection and identification method for medical document structured knowledge extraction
CN111860257A (en) * 2020-07-10 2020-10-30 上海交通大学 Table identification method and system fusing multiple text features and geometric information
CN111784802A (en) * 2020-07-30 2020-10-16 支付宝(杭州)信息技术有限公司 Image generation method, device and equipment
CN111967387A (en) * 2020-08-17 2020-11-20 北京市商汤科技开发有限公司 Form recognition method, device, equipment and computer readable storage medium
CN112241481A (en) * 2020-10-09 2021-01-19 中国人民解放军国防科技大学 Cross-modal news event classification method and system based on graph neural network
CN112215236A (en) * 2020-10-21 2021-01-12 科大讯飞股份有限公司 Text recognition method and device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于Bert模型的框架类型检测方法;高李政;周刚;罗军勇;黄永忠;;信息工程大学学报(第02期);全文 *
基于图像的信息隐藏检测算法和实现技术研究综述;夏煜, 郎荣玲, 曹卫兵, 戴冠中;计算机研究与发展(第04期);全文 *
家谱文本中实体关系提取方法研究;任明;许光;王文祥;;中文信息学报(第06期);全文 *

Also Published As

Publication number Publication date
CN112949476A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
CN112949476B (en) Text relation detection method, device and storage medium based on graph convolution neural network
US11977534B2 (en) Automated document processing for detecting, extracting, and analyzing tables and tabular data
US11170249B2 (en) Identification of fields in documents with neural networks using global document context
US11074442B2 (en) Identification of table partitions in documents with neural networks using global document context
US11651150B2 (en) Deep learning based table detection and associated data extraction from scanned image documents
CN116049397B (en) Sensitive information discovery and automatic classification method based on multi-mode fusion
CN114612921B (en) Form recognition method and device, electronic equipment and computer readable medium
Bansal et al. Table extraction from document images using fixed point model
CN115546809A (en) Table structure identification method based on cell constraint and application thereof
US20220101060A1 (en) Text partitioning method, text classifying method, apparatus, device and storage medium
CN112632948B (en) Case document ordering method and related equipment
Gal et al. Cardinal graph convolution framework for document information extraction
Vinokurov Tabular information recognition using convolutional neural networks
Al Ghamdi A novel approach to printed Arabic optical character recognition
CN115984886A (en) Table information extraction method, device, equipment and storage medium
US20230138491A1 (en) Continuous learning for document processing and analysis
US20230134218A1 (en) Continuous learning for document processing and analysis
CN110147516A (en) The intelligent identification Method and relevant device of front-end code in Pages Design
CN116030469A (en) Processing method, processing device, processing equipment and computer readable storage medium
CN115410185A (en) Method for extracting specific name and unit name attributes in multi-modal data
CN115880702A (en) Data processing method, device, equipment, program product and storage medium
CN115238645A (en) Asset data identification method and device, electronic equipment and computer storage medium
Pham et al. A deep learning approach for text segmentation in document analysis
CN113128496A (en) Method, device and equipment for extracting structured data from image
CN116151202B (en) Form filling method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant