CN112949477A - Information identification method and device based on graph convolution neural network and storage medium - Google Patents

Information identification method and device based on graph convolution neural network and storage medium Download PDF

Info

Publication number
CN112949477A
CN112949477A CN202110224516.4A CN202110224516A CN112949477A CN 112949477 A CN112949477 A CN 112949477A CN 202110224516 A CN202110224516 A CN 202110224516A CN 112949477 A CN112949477 A CN 112949477A
Authority
CN
China
Prior art keywords
character
information
text
text block
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110224516.4A
Other languages
Chinese (zh)
Other versions
CN112949477B (en
Inventor
侯绍东
熊玉竹
周以晴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Meinenghua Intelligent Technology Co ltd
Original Assignee
Suzhou Meinenghua Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Meinenghua Intelligent Technology Co ltd filed Critical Suzhou Meinenghua Intelligent Technology Co ltd
Priority to CN202110224516.4A priority Critical patent/CN112949477B/en
Publication of CN112949477A publication Critical patent/CN112949477A/en
Application granted granted Critical
Publication of CN112949477B publication Critical patent/CN112949477B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The application relates to an information identification method, an information identification device and a storage medium based on a graph convolution neural network, which belong to the technical field of computers, and the method comprises the following steps: obtaining semantic features of text blocks in a target image and visual features among different text blocks; inputting the characteristic information of each text block into a first graph convolution neural network to obtain a text block type and a hidden vector; the feature information comprises semantic features of the text blocks, semantic features of the associated text blocks and visual features between the text blocks and the associated text blocks; inputting the hidden vector of the text block, the type of the text block and character characteristic information of the character into a preset character model to obtain the character type of the character; inputting the character information of each character into a second graph convolution neural network to obtain an edge attribute; identifying a solid block based on the edge attribute; the problem that the accuracy of information identification by using semantic features is low can be solved; the type reasoning can be carried out by combining the semantic and spatial characteristics, and the accuracy of information identification is improved.

Description

Information identification method and device based on graph convolution neural network and storage medium
Technical Field
The application relates to an information identification method, an information identification device and a storage medium based on a graph convolution neural network, and belongs to the technical field of computers.
Background
Information recognition (or named entity recognition) is a fundamental problem in the field of natural language processing. Briefly, named entity recognition identifies and categorizes entities of interest contained in a text sequence, such as: extracting key information from documents such as bills and logistics sheets.
Currently, the information identification method includes: named entity recognition is performed based on a linear chain based language model such as Bidirectional Encoder (BERT).
However, the text to be recognized may have reasons such as semantic specificity, irregular layout, irregular word segmentation granularity, and the like, and therefore, the accuracy of information recognition may not be high.
Disclosure of Invention
The application provides an information identification method, an information identification device and a storage medium based on a graph convolution neural network, which can solve the problem of low identification accuracy when information identification is carried out only by using semantic features. The application provides the following technical scheme:
in a first aspect, an information identification method based on a graph convolution neural network is provided, the method including:
acquiring a target image with text information;
dividing the text information into a plurality of text blocks, wherein each text block comprises at least one character;
obtaining semantic features of each text block in the plurality of text blocks and visual features among different text blocks;
inputting the characteristic information of each text block into a pre-trained first graph convolution neural network to obtain the text block type and the hidden vector of the text block; the feature information includes semantic features of the text block, semantic features of an associated text block associated with the text block, and visual features between the text block and the associated text block;
for each character in each text block, inputting the hidden vector of the text block, the type of the text block and the character feature information of the character into a preset character model to obtain the character type of the character; the character characteristic information is used for indicating the position of the character;
inputting the character information of each character into a pre-trained second graph convolution neural network to obtain the edge attribute between the character and the associated character; the character information comprises type embedded codes of character types of the characters and visual features between the characters and associated characters;
and splicing the characters with the same edge attributes according to the association relationship to obtain the identified entity block.
Optionally, the obtaining semantic features of each text block in the plurality of text blocks includes:
for each text block, inputting the character strings in the text block into a pre-trained first Recurrent Neural Network (RNN) to obtain a feature vector of each character string;
and determining semantic features corresponding to the text blocks based on the feature vector of each character string.
Optionally, the determining semantic features corresponding to the text block based on the feature vector of each character string includes:
splicing the feature vectors of each character string to obtain the semantic features;
alternatively, the first and second electrodes may be,
splicing the feature vectors of each character string to obtain splicing features; acquiring grid characteristics of the target image based on a residual error network ResNet and a void space convolution pooling ASPP; and mixing the splicing characteristic and the grid characteristic to obtain the semantic characteristic.
Optionally, acquiring visual features between different text blocks includes:
discretizing the relative position between the text block and the associated text block according to the direction and the distance to obtain a direction code and a distance code;
inputting the direction code and the distance code into a first embedding model to obtain a direction embedding code, a horizontal distance embedding code and a vertical distance embedding code;
and after splicing the direction embedded code, the horizontal distance embedded code and the vertical distance embedded code, projecting the spliced codes to a vector with a fixed length to obtain the visual features.
Optionally, the inputting the feature information of each text block into a pre-trained first graph convolution neural network to obtain a text block type and a hidden vector of the text block includes:
performing projection transformation on the characteristic information through the first graph convolution neural network to obtain the weight and the updating information of each type of characteristic information, wherein the hidden vector comprises the updating information;
and superposing the update information and the weight corresponding to each type of feature information to obtain the text block type.
Optionally, the character feature information is a normalized feature of a character position; for each character in each text block, inputting the hidden vector of the text block, the text block type and the character feature information of the character into a preset character model to obtain the character type of the character, wherein the character type comprises the following steps:
embedding, coding and splicing the types of the hidden vector and the text block type through the character model to obtain state information;
splicing the characters and the normalized features, and inputting the spliced features into a second RNN network to obtain hidden vectors of the characters;
and splicing the state information and the character hidden vector, and inputting the spliced state information and the character hidden vector into a third RNN (radio network) to obtain the character type of the character.
Optionally, the type embedding code is obtained by inputting the character type into a second embedding model; inputting the character information of each character into a pre-trained second graph convolution neural network to obtain an edge attribute between the character and the associated character, wherein the edge attribute comprises:
performing projection transformation on the type embedded code and the visual characteristics through the second convolutional network to obtain update information and weight;
multiplying the weight by the updating information to obtain weighted updating information;
determining the edge attribute based on the weighted update information.
Optionally, the dividing the text information into a plurality of text blocks includes:
and dividing the text information according to the character spacing of each character in the text information to obtain the plurality of text blocks.
In a second aspect, an information identification apparatus based on a graph-convolution neural network is provided, the apparatus including a processor and a memory; the memory stores a program, which is loaded and executed by the processor to implement the information identification method based on the graph-convolution neural network according to the first aspect.
In a third aspect, a computer-readable storage medium is provided, in which a program is stored, and the program is loaded and executed by the processor to implement the information identification method based on the atlas neural network of the first aspect.
The beneficial effect of this application lies in: dividing text information in a target image into a plurality of text blocks; obtaining semantic features of each text block in a plurality of text blocks and visual features among different text blocks; inputting the characteristic information of each text block into a pre-trained first graph convolution neural network to obtain the text block type and the hidden vector of the text block; the feature information comprises semantic features of the text block, semantic features of an associated text block associated with the text block, and visual features between the text block and the associated text block; for each character in each text block, inputting the hidden vector of the text block, the type of the text block and character characteristic information of the character into a preset character model to obtain the character type of the character; the character characteristic information is used for indicating the position of the character; inputting the character information of each character into a pre-trained second graph convolution neural network to obtain edge attributes between the characters and the associated characters; the character information comprises type embedded codes of character types of the characters and visual features between the characters and the associated characters; splicing the characters with the same edge attributes according to the association relationship to obtain an identified entity block; the problem of low identification accuracy when information identification is carried out only by using semantic features can be solved; because the graph convolution neural network is used for both the text block level type inference and the character level type inference, the type inference can be carried out by combining semantic and spatial characteristics, and the accuracy of information identification can be improved. Optionally, type reasoning can be performed by combining three characteristics of semantics, images and spaces, so as to further improve the accuracy of information identification.
In addition, the type inference is carried out by using the graph convolution neural network, the type inference which can be completed by a plurality of processes originally can be shortened, the identification from text information to entity information can be realized only through two stages, and the information identification efficiency is improved.
In addition, for the way that characters are used as nodes and fixed distances are used as neighborhoods to establish a directed graph, and the entity corresponding to the characters is identified by using a graph convolution neural network based on the directed graph, the number of nodes and edges of the graph to be established are large, the graph convolution neural network is difficult to train, and the memory occupation is overlarge. Based on this, the present application divides the type inference into two phases, and in the second phase, since the prediction logic and feature information of the edge are simple compared to the foregoing manner, the amount of computation of the graph convolution neural network can be reduced.
The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical solutions of the present application more clear and clear, and to implement the technical solutions according to the content of the description, the following detailed description is made with reference to the preferred embodiments of the present application and the accompanying drawings.
Drawings
FIG. 1 is a flow chart of a method for identifying information based on a graph convolution neural network according to an embodiment of the present application;
FIG. 2 is a block diagram of an information recognition apparatus based on a graph convolution neural network according to an embodiment of the present application;
fig. 3 is a block diagram of an information identification apparatus based on a graph convolution neural network according to an embodiment of the present application.
Detailed Description
The following detailed description of embodiments of the present application will be described in conjunction with the accompanying drawings and examples. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.
First, several terms referred to in the present application will be described.
Text block nodes: the text block segmented by a certain threshold value comprises text content, a text position and a related picture background.
Character node: and segmenting the text block by taking the characters as units, wherein the text block comprises the characters and the character positions.
Type reasoning: the entity type of each character is predicted.
And (3) relationship reasoning: and judging whether the two character nodes belong to the same entity so as to realize the splicing of all characters of the same entity.
Graph Convolutional neural Network (GCN): the method is a neural Network which is applied to a Graph by adopting Graph convolution and can be applied to Graph Embedding (GE).
Graph G ═ V, E, where V is the set of nodes and E is the set of edges, and for each node i, there is its characteristic xiCan use matrix XN*DAnd (4) showing. Where N represents the number of nodes, and D represents the number of features of each node, or the dimension of a feature vector.
Graph convolution refers to a process of determining a feature representation of a current node by nodes surrounding the current node. The peripheral nodes may be neighbor nodes of the current node, that is, nodes (or called nodes) having an association relationship with the current node, or neighbor nodes of the current node, and the like, and the type of the peripheral nodes is not limited in the present application.
Graph convolution can be represented by the following nonlinear function:
Hl+1=f(Hl,A)
wherein H0X is the input of the first layer, X belongs to RN*DN is the number of nodes of the graph, D is the dimension of each node feature vector, A is an adjacent matrix, and the functions f of the convolutional neural networks of different graphs are the same or different.
Void space Convolutional pooling (ASPP) is used to capture a larger field of view than a conventional Convolutional Neural Network (CNN). The hole convolution can be used for extracting features of the image in a semantic segmentation task, so that the large receptive field is achieved, and the resolution of the feature image is not reduced too much.
Optionally, in the present application, an execution subject of each embodiment is taken as an example of an electronic device with computing capability, the electronic device may be a terminal or a server, the terminal may be a computer, a notebook computer, a tablet computer, and the like, and the embodiment does not limit the type of the terminal and the type of the electronic device.
The information identification method provided by the embodiment is suitable for identifying the information (or named entity) of interest in the target image with the text information. The target image may include contents of various format attributes such as a picture, a table, a background, and the like, in addition to text information. In practical application, the target image may be an image of various documents such as a value-added tax invoice and a policy, and of course, may also be other types of target images, such as a certificate image, and the embodiment does not limit the type of the target image.
The actual requirements of the information identification provided by the present application are schematically illustrated below by two examples. Such as: the information recognition targets are: the invoice date of invoicing is identified. Since various dates such as the invoicing date and the bill generation date may appear in the invoice, it is often impossible to judge which is the invoicing date only from the date format.
For another example: the information recognition targets are: and identifying the tax-free amount in the value-added tax invoice. Because the value-added tax invoice has a plurality of money amounts, such as single commodity money amount, tax-free money amount, total money amount and the like, and the formats are digital, which is the tax-free money amount cannot be judged only from the data format.
Based on this, if the target image contains a plurality of contents with the same format as the information to be identified, how to accurately identify the interested named entity becomes a problem to be solved urgently. Based on this, the information identification scheme that this application provided includes two stages: the first stage is used for performing text block level type inference and character level type inference on text information in a target image; the second stage is used to calculate edge properties between characters based on the results of the first stage.
Because the graph convolution neural network is used for both the text block level type inference and the character level type inference, the type inference can be carried out by combining semantic and spatial characteristics, and the accuracy of information identification can be improved. Optionally, type reasoning can be performed by combining three characteristics of semantics, images and spaces, so as to further improve the accuracy of information identification.
In addition, the type inference is carried out by using the graph convolution neural network, the type inference which can be completed by a plurality of processes originally can be shortened, the identification from text information to entity information can be realized only through two stages, and the information identification efficiency is improved.
In addition, for the way that characters are used as nodes and fixed distances are used as neighborhoods to establish a directed graph, and the entity corresponding to the characters is identified by using a graph convolution neural network based on the directed graph, the number of nodes and edges of the graph to be established are large, the graph convolution neural network is difficult to train, and the memory occupation is overlarge. Based on this, the present application divides the type inference into two phases, and in the second phase, since the prediction logic and feature information of the edge are simple compared to the foregoing manner, the amount of computation of the graph convolution neural network can be reduced.
The information identification method based on the graph convolution neural network provided by the application is described in detail below.
Fig. 1 is a flowchart of an information identification method based on a graph convolution neural network according to an embodiment of the present application. The method at least comprises the following steps:
step 101, acquiring a target image with text information.
The text information has entity information to be identified, the target image can be an image of an invoice, a certificate image, a ticket image and the like, and the embodiment does not limit the type of the target image.
Optionally, the target image can be acquired by the electronic device or sent by other devices; the target image may be a frame image in the video stream or a single image, and the source of the target image is not limited in this embodiment.
Optionally, the number of the target images may be one or more, and in this embodiment, the steps 101 and 107 are sequentially performed for each target image.
Step 102, dividing the text information into a plurality of text blocks, wherein each text block comprises at least one character.
The electronic equipment uses a character recognition program to recognize and obtain text information; and then, dividing the text information according to the character spacing of each character in the text information to obtain a plurality of text blocks.
The character recognition program is used for acquiring characters in a target image, such as: a plurality of text information in the target image is acquired using Optical Character Recognition (OCR). For another example: for a target image in a PDF format, text information in the target image is not acquired through tools such as an Apache pdfbox plug-in and the like.
Dividing the text information according to the character spacing of each character in the text information to obtain a plurality of text blocks, comprising: for each character, determining whether there are other characters within a first distance of the character in the horizontal direction; if yes, determining that the other characters and the character belong to the same text block; and/or, for each character, determining whether the character has other characters within a second distance of the vertical direction; and if so, determining that the other character and the character belong to the same text block.
The first distance and the second distance may be adaptively determined according to the identification requirement, and the values of the first distance and the second distance are not limited in this embodiment.
Optionally, each text block comprises at least one character string.
Step 103, obtaining semantic features of each text block in the plurality of text blocks and visual features among different text blocks.
In one example, obtaining semantic features of each text block of a plurality of text blocks comprises: for each text block, inputting the character strings in the text block into a pre-trained first RNN to obtain a feature vector of each character string; and determining semantic features corresponding to the text blocks based on the feature vector of each character string.
In this embodiment, the feature vector of the character string identified by the RNN network is taken as an example for explanation, and in actual implementation, the electronic device may also calculate the feature vector of the character string by using a linear regression model, or calculate the feature vector by using word2vector, and this embodiment does not limit the manner of obtaining the feature vector of the character string.
Optionally, determining semantic features corresponding to the text block based on the feature vector of each character string includes: splicing the feature vectors of each character string to obtain semantic features; or splicing the feature vectors of each character string to obtain splicing features; acquiring grid characteristics of a target image based on a residual error network ResNet and a void space convolution pooling ASPP; and mixing the splicing features and the grid features to obtain semantic features.
When feature vector splicing is performed, a Summary model can be used for weighted superposition to obtain a fixed vector (i.e., splicing features).
Optionally, a control whether the image feature is enabled is configured in the electronic device, and when the control controls the enabling of the image feature, the electronic device executes a mesh feature acquisition based on a residual error network ResNet and a void space convolution pooling ASPP to acquire a target image; mixing the splicing features and the grid features to obtain semantic features; and when the control component controls the forbidden image features, the electronic equipment carries out the step of splicing the feature vectors of each character string to obtain the semantic features.
The control state of the control may be controlled by a user or may be controlled by other devices, and the control manner of the control is not limited in this embodiment.
In one example, obtaining visual features between different text blocks includes: discretizing the relative position between the text block and the associated text block according to the direction and the distance to obtain a direction code and a distance code; inputting the direction code and the distance code into a first embedding model to obtain a direction embedding code, a horizontal distance embedding code and a vertical distance embedding code; and after splicing the direction embedded code, the horizontal distance embedded code and the vertical distance embedded code, projecting the spliced codes to a vector with a fixed length to obtain the visual characteristics.
The visual feature is used to indicate a positional relationship between two associated text blocks. In this embodiment, the text blocks are used as nodes of the directed graph, and a center connecting line between two text blocks having an association relationship is used as an edge of the directed graph, so that the directed graph is constructed. Such as: discretizing by direction and distance with the relative position (central line vector) of the two associated nodes, the discretization refers to: the direction is divided into a plurality of directions according to angles (for example, 360 directions, the difference between two adjacent directions is 1 degree), the vertical distance and the horizontal distance are divided by the length and the width of the target image on the distance, normalized vertical distance and normalized horizontal distance are obtained, and then the normalized vertical distance and the normalized horizontal distance are multiplied by 1000 and are rounded. In this way, directional integer coding and distance integer coding are obtained. And calculating corresponding embedded codes of the directional, horizontal and vertical integer codes through a first embedded model to obtain three embedded codes of the directional, horizontal and vertical. And splicing and projecting the embedded codes to vectors with fixed lengths as edge features of the directed graph.
Wherein, the first embedded model may be a pre-trained embedding layer.
In this embodiment, the association relationship between text blocks may be determined according to the correlation.
Step 104, inputting the characteristic information of each text block into a pre-trained first graph convolution neural network to obtain the text block type and the hidden vector of the text block; the feature information includes semantic features of the text block, semantic features of an associated text block associated with the text block, and visual features between the text block and the associated text block.
Inputting the characteristic information of each text block into a pre-trained first graph convolution neural network to obtain the text block type and the hidden vector of the text block, wherein the method comprises the following steps: performing projection transformation on the characteristic information through a first graph convolution neural network to obtain the weight and the updating information of each kind of characteristic information, wherein the hidden vector comprises the updating information; and superposing the updated information and the weight corresponding to each type of feature information to obtain the text block type.
The weight of the feature information can be determined by adding an attention mechanism in the graph convolution neural network, and the sum of the weights of various feature information is 1.
Step 105, inputting the hidden vector of each text block, the type of the text block and character feature information of the character into a preset character model to obtain the character type of the character for each character in each text block; the character feature information is used to indicate the position of the character.
In one example, the character feature information is a normalized feature of the character position. For each character in each text block, inputting the hidden vector of the text block, the text block type and the character feature information of the character into a preset character model to obtain the character type of the character, wherein the character type comprises the following steps: embedding, coding and splicing types of the hidden vector and the text block type through a character model to obtain state information; splicing the characters and the normalized features, and inputting the spliced features into a second RNN network to obtain a character hidden vector; and splicing the state information and the character hidden vector, and inputting the spliced state information and the character hidden vector into a third RNN network to obtain the character type of the character.
Such as: the character model performs the following operations: splicing the node hidden vectors and the type embedded codes into state information; and splicing the character string and the position normalization features, and calculating a character hidden vector through an RNN. And expanding the state information, splicing the state information with the character hidden vector, and performing second RNN calculation to obtain a character level type probability vector.
Step 106, inputting the character information of each character into a pre-trained second graph convolution neural network to obtain edge attributes between the characters and associated characters; the character information includes a type embedding code of a character type of the character and a visual feature between the character and an associated character.
In one example, the type embedding code is derived by inputting the character type into a second embedding model. The second embedded model may also be a pre-trained embedding layer. The electronic equipment performs projection transformation on the character information through a second convolution network to obtain updated information and weight; multiplying the weight by the update information to obtain weighted update information; the edge attribute is determined based on the weighted update information.
Such as: and for each character node, splicing the node information associated with the edge through a second graph convolution neural network, and calculating the attribute of the edge through a multilayer forward network.
In this embodiment, a directed graph is created according to character types and input character positions, a boolean attribute of an edge is calculated based on the directed graph, if the boolean attribute of the edge is True, character nodes connected to the edge belong to the same entity block, and a minimum closure of an equivalence relation is formed with the edge having the attribute of True (that is, if a- > b, b- > c, a-, c- > b, b- > a is also True, even if a- > c or b- > a may be predicted to be False in actual calculation).
And step 107, splicing the characters with the same edge attributes according to the association relationship to obtain the identified entity block.
In summary, in the information identification method based on the atlas neural network provided in this embodiment, the text information in the target image is divided into a plurality of text blocks; obtaining semantic features of each text block in a plurality of text blocks and visual features among different text blocks; inputting the characteristic information of each text block into a pre-trained first graph convolution neural network to obtain the text block type and the hidden vector of the text block; the feature information comprises semantic features of the text block, semantic features of an associated text block associated with the text block, and visual features between the text block and the associated text block; for each character in each text block, inputting the hidden vector of the text block, the type of the text block and character characteristic information of the character into a preset character model to obtain the character type of the character; the character characteristic information is used for indicating the position of the character; inputting the character information of each character into a pre-trained second graph convolution neural network to obtain edge attributes between the characters and the associated characters; the character information comprises type embedded codes of character types of the characters and visual features between the characters and the associated characters; splicing the characters with the same edge attributes according to the association relationship to obtain an identified entity block; the problem of low identification accuracy when information identification is carried out only by using semantic features can be solved; because the graph convolution neural network is used for both the text block level type inference and the character level type inference, the type inference can be carried out by combining semantic and spatial characteristics, and the accuracy of information identification can be improved. Optionally, type reasoning can be performed by combining three characteristics of semantics, images and spaces, so as to further improve the accuracy of information identification.
In addition, the type inference is carried out by using the graph convolution neural network, the type inference which can be completed by a plurality of processes originally can be shortened, the identification from text information to entity information can be realized only through two stages, and the information identification efficiency is improved.
In addition, for the way that characters are used as nodes and fixed distances are used as neighborhoods to establish a directed graph, and the entity corresponding to the characters is identified by using a graph convolution neural network based on the directed graph, the number of nodes and edges of the graph to be established are large, the graph convolution neural network is difficult to train, and the memory occupation is overlarge. Based on this, the present application divides the type inference into two phases, and in the second phase, since the prediction logic and feature information of the edge are simple compared to the foregoing manner, the amount of computation of the graph convolution neural network can be reduced.
Fig. 2 is a block diagram of an information identification apparatus based on a graph convolution neural network according to an embodiment of the present application. The device at least comprises the following modules: an image acquisition module 210, a text segmentation module 220, a feature acquisition module 230, a first classification module 240, a second classification module 250, an attribute calculation module 260, and an entity identification module 270.
An image obtaining module 210, configured to obtain a target image with text information;
a text dividing module 220, configured to divide the text information into a plurality of text blocks, where each text block includes at least one character;
a feature obtaining module 230, configured to obtain a semantic feature of each text block in the plurality of text blocks and a visual feature between different text blocks;
the first classification module 240 is configured to input the feature information of each text block into a pre-trained first graph convolution neural network, so as to obtain a text block type and a hidden vector of the text block; the feature information includes semantic features of the text block, semantic features of an associated text block associated with the text block, and visual features between the text block and the associated text block;
a second classification module 250, configured to, for each character in each text block, input a hidden vector of the text block, a text block type, and character feature information of the character into a preset character model, so as to obtain a character type of the character; the character characteristic information is used for indicating the position of the character;
the attribute calculation module 260 is configured to input the character information of each character into a pre-trained second graph convolution neural network, so as to obtain an edge attribute between the character and an associated character; the character information comprises type embedded codes of character types of the characters and visual features between the characters and associated characters;
and the entity identification module 270 is configured to splice the characters with the same edge attributes according to the association relationship, so as to obtain an identified entity block.
For relevant details reference is made to the above-described method embodiments.
It should be noted that: in the information recognition device based on the convolutional neural network provided in the above embodiment, when performing information recognition based on the convolutional neural network, only the division of the above functional modules is taken as an example, and in practical applications, the function distribution may be completed by different functional modules as needed, that is, the internal structure of the information recognition device based on the convolutional neural network is divided into different functional modules to complete all or part of the functions described above. In addition, the information identification device based on the graph convolution neural network provided by the above embodiment and the information identification method based on the graph convolution neural network belong to the same concept, and the specific implementation process thereof is described in the method embodiment and is not described herein again.
Fig. 3 is a block diagram of an information identification apparatus based on a graph convolution neural network according to an embodiment of the present application. The apparatus comprises at least a processor 301 and a memory 302.
Processor 301 may include one or more processing cores, such as: 4 core processors, 8 core processors, etc. The processor 301 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 301 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 301 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 301 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
Memory 302 may include one or more computer-readable storage media, which may be non-transitory. Memory 302 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 302 is used to store at least one instruction for execution by processor 301 to implement the method for information identification based on a graph-convolution neural network provided by method embodiments herein.
In some embodiments, the information identification device based on the graph convolution neural network may further include: a peripheral interface and at least one peripheral. The processor 301, memory 302 and peripheral interface may be connected by bus or signal lines. Each peripheral may be connected to the peripheral interface via a bus, signal line, or circuit board. Illustratively, peripheral devices include, but are not limited to: radio frequency circuit, touch display screen, audio circuit, power supply, etc.
Of course, the information identification apparatus based on the graph convolution neural network may further include fewer or more components, which is not limited in this embodiment.
Optionally, the present application further provides a computer-readable storage medium, in which a program is stored, and the program is loaded and executed by a processor to implement the information identification method based on the graph convolution neural network of the above method embodiment.
Optionally, the present application further provides a computer product, which includes a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and the program is loaded and executed by a processor to implement the information identification method based on the graph convolution neural network of the above-mentioned method embodiment.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
The above is only one specific embodiment of the present application, and any other modifications based on the concept of the present application are considered as the protection scope of the present application.

Claims (10)

1. An information identification method based on a graph convolution neural network is characterized by comprising the following steps:
acquiring a target image with text information;
dividing the text information into a plurality of text blocks, wherein each text block comprises at least one character;
obtaining semantic features of each text block in the plurality of text blocks and visual features among different text blocks;
inputting the characteristic information of each text block into a pre-trained first graph convolution neural network to obtain the text block type and the hidden vector of the text block; the feature information includes semantic features of the text block, semantic features of an associated text block associated with the text block, and visual features between the text block and the associated text block;
for each character in each text block, inputting the hidden vector of the text block, the type of the text block and the character feature information of the character into a preset character model to obtain the character type of the character; the character characteristic information is used for indicating the position of the character;
inputting the character information of each character into a pre-trained second graph convolution neural network to obtain the edge attribute between the character and the associated character; the character information comprises type embedded codes of character types of the characters and visual features between the characters and associated characters;
and splicing the characters with the same edge attributes according to the association relationship to obtain the identified entity block.
2. The method of claim 1, wherein obtaining semantic features of each text block of the plurality of text blocks comprises:
for each text block, inputting the character strings in the text block into a pre-trained first Recurrent Neural Network (RNN) to obtain a feature vector of each character string;
and determining semantic features corresponding to the text blocks based on the feature vector of each character string.
3. The method of claim 2, wherein the determining semantic features corresponding to the text block based on the feature vector of each character string comprises:
splicing the feature vectors of each character string to obtain the semantic features;
alternatively, the first and second electrodes may be,
splicing the feature vectors of each character string to obtain splicing features; acquiring grid characteristics of the target image based on a residual error network ResNet and a void space convolution pooling ASPP; and mixing the splicing characteristic and the grid characteristic to obtain the semantic characteristic.
4. The method of claim 1, wherein obtaining visual features between different text blocks comprises:
discretizing the relative position between the text block and the associated text block according to the direction and the distance to obtain a direction code and a distance code;
inputting the direction code and the distance code into a first embedding model to obtain a direction embedding code, a horizontal distance embedding code and a vertical distance embedding code;
and after splicing the direction embedded code, the horizontal distance embedded code and the vertical distance embedded code, projecting the spliced codes to a vector with a fixed length to obtain the visual features.
5. The method of claim 1, wherein the inputting the feature information of each text block into a pre-trained first atlas neural network to obtain a text block type and a hidden vector of the text block comprises:
performing projection transformation on the characteristic information through the first graph convolution neural network to obtain the weight and the updating information of each type of characteristic information, wherein the hidden vector comprises the updating information;
and superposing the update information and the weight corresponding to each type of feature information to obtain the text block type.
6. The method according to claim 1, wherein the character feature information is normalized features of character positions; for each character in each text block, inputting the hidden vector of the text block, the text block type and the character feature information of the character into a preset character model to obtain the character type of the character, wherein the character type comprises the following steps:
embedding, coding and splicing the types of the hidden vector and the text block type through the character model to obtain state information;
splicing the characters and the normalized features, and inputting the spliced features into a second RNN network to obtain hidden vectors of the characters;
and splicing the state information and the character hidden vector, and inputting the spliced state information and the character hidden vector into a third RNN (radio network) to obtain the character type of the character.
7. The method of claim 1, wherein the type-embedding encoding is performed by inputting the character type into a second embedding model; inputting the character information of each character into a pre-trained second graph convolution neural network to obtain an edge attribute between the character and the associated character, wherein the edge attribute comprises:
performing projection transformation on the type embedded code and the visual characteristics through the second convolutional network to obtain update information and weight;
multiplying the weight by the updating information to obtain weighted updating information;
determining the edge attribute based on the weighted update information.
8. The method of claim 1, wherein the dividing the text information into a plurality of text blocks comprises:
and dividing the text information according to the character spacing of each character in the text information to obtain the plurality of text blocks.
9. An information recognition apparatus based on a graph convolution neural network, the apparatus comprising a processor and a memory; the memory stores a program, which is loaded and executed by the processor to implement the information identification method based on the atlas neural network of any one of claims 1 to 8.
10. A computer-readable storage medium, characterized in that the storage medium stores a program which, when executed by a processor, is used to implement the information identification method based on a atlas neural network as recited in any one of claims 1 to 8.
CN202110224516.4A 2021-03-01 2021-03-01 Information identification method, device and storage medium based on graph convolution neural network Active CN112949477B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110224516.4A CN112949477B (en) 2021-03-01 2021-03-01 Information identification method, device and storage medium based on graph convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110224516.4A CN112949477B (en) 2021-03-01 2021-03-01 Information identification method, device and storage medium based on graph convolution neural network

Publications (2)

Publication Number Publication Date
CN112949477A true CN112949477A (en) 2021-06-11
CN112949477B CN112949477B (en) 2024-03-15

Family

ID=76246866

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110224516.4A Active CN112949477B (en) 2021-03-01 2021-03-01 Information identification method, device and storage medium based on graph convolution neural network

Country Status (1)

Country Link
CN (1) CN112949477B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343982A (en) * 2021-06-16 2021-09-03 北京百度网讯科技有限公司 Entity relationship extraction method, device and equipment for multi-modal feature fusion
CN114283403A (en) * 2021-12-24 2022-04-05 北京有竹居网络技术有限公司 Image detection method, device, storage medium and equipment
CN114937277A (en) * 2022-05-18 2022-08-23 北京百度网讯科技有限公司 Image-based text acquisition method and device, electronic equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190236135A1 (en) * 2018-01-30 2019-08-01 Accenture Global Solutions Limited Cross-lingual text classification
CN110222330A (en) * 2019-04-26 2019-09-10 平安科技(深圳)有限公司 Method for recognizing semantics and device, storage medium, computer equipment
CN110569500A (en) * 2019-07-23 2019-12-13 平安国际智慧城市科技股份有限公司 Text semantic recognition method and device, computer equipment and storage medium
CN110598206A (en) * 2019-08-13 2019-12-20 平安国际智慧城市科技股份有限公司 Text semantic recognition method and device, computer equipment and storage medium
CN110765872A (en) * 2019-09-19 2020-02-07 中山大学 Online mathematical education resource classification method based on visual features
CN111259672A (en) * 2020-02-12 2020-06-09 新疆大学 Chinese tourism field named entity identification method based on graph convolution neural network
CN111753822A (en) * 2019-03-29 2020-10-09 北京市商汤科技开发有限公司 Text recognition method and device, electronic equipment and storage medium
CN111967387A (en) * 2020-08-17 2020-11-20 北京市商汤科技开发有限公司 Form recognition method, device, equipment and computer readable storage medium
CN112084790A (en) * 2020-09-24 2020-12-15 中国民航大学 Relation extraction method and system based on pre-training convolutional neural network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190236135A1 (en) * 2018-01-30 2019-08-01 Accenture Global Solutions Limited Cross-lingual text classification
CN111753822A (en) * 2019-03-29 2020-10-09 北京市商汤科技开发有限公司 Text recognition method and device, electronic equipment and storage medium
CN110222330A (en) * 2019-04-26 2019-09-10 平安科技(深圳)有限公司 Method for recognizing semantics and device, storage medium, computer equipment
CN110569500A (en) * 2019-07-23 2019-12-13 平安国际智慧城市科技股份有限公司 Text semantic recognition method and device, computer equipment and storage medium
CN110598206A (en) * 2019-08-13 2019-12-20 平安国际智慧城市科技股份有限公司 Text semantic recognition method and device, computer equipment and storage medium
CN110765872A (en) * 2019-09-19 2020-02-07 中山大学 Online mathematical education resource classification method based on visual features
CN111259672A (en) * 2020-02-12 2020-06-09 新疆大学 Chinese tourism field named entity identification method based on graph convolution neural network
CN111967387A (en) * 2020-08-17 2020-11-20 北京市商汤科技开发有限公司 Form recognition method, device, equipment and computer readable storage medium
CN112084790A (en) * 2020-09-24 2020-12-15 中国民航大学 Relation extraction method and system based on pre-training convolutional neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DIFEI GAO.ETC: ""Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text"", 《CVF》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343982A (en) * 2021-06-16 2021-09-03 北京百度网讯科技有限公司 Entity relationship extraction method, device and equipment for multi-modal feature fusion
CN114283403A (en) * 2021-12-24 2022-04-05 北京有竹居网络技术有限公司 Image detection method, device, storage medium and equipment
CN114283403B (en) * 2021-12-24 2024-01-16 北京有竹居网络技术有限公司 Image detection method, device, storage medium and equipment
CN114937277A (en) * 2022-05-18 2022-08-23 北京百度网讯科技有限公司 Image-based text acquisition method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112949477B (en) 2024-03-15

Similar Documents

Publication Publication Date Title
CN113822494B (en) Risk prediction method, device, equipment and storage medium
WO2021093435A1 (en) Semantic segmentation network structure generation method and apparatus, device, and storage medium
CN112949477B (en) Information identification method, device and storage medium based on graph convolution neural network
WO2022105125A1 (en) Image segmentation method and apparatus, computer device, and storage medium
CN111709339A (en) Bill image recognition method, device, equipment and storage medium
US8381094B1 (en) Incremental visual comparison of web browser screens
JP2023541532A (en) Text detection model training method and apparatus, text detection method and apparatus, electronic equipment, storage medium, and computer program
US20200151508A1 (en) Digital Image Layout Training using Wireframe Rendering within a Generative Adversarial Network (GAN) System
CN111324696B (en) Entity extraction method, entity extraction model training method, device and equipment
US20180365594A1 (en) Systems and methods for generative learning
CN116049397B (en) Sensitive information discovery and automatic classification method based on multi-mode fusion
CN112949476A (en) Text relation detection method and device based on graph convolution neural network and storage medium
CN115917613A (en) Semantic representation of text in a document
US20220392242A1 (en) Method for training text positioning model and method for text positioning
KR20220047228A (en) Method and apparatus for generating image classification model, electronic device, storage medium, computer program, roadside device and cloud control platform
CN111611390B (en) Data processing method and device
CN110019952B (en) Video description method, system and device
CN114241524A (en) Human body posture estimation method and device, electronic equipment and readable storage medium
CN116843901A (en) Medical image segmentation model training method and medical image segmentation method
CN116774973A (en) Data rendering method, device, computer equipment and storage medium
CN116630712A (en) Information classification method and device based on modal combination, electronic equipment and medium
CN114241411B (en) Counting model processing method and device based on target detection and computer equipment
CN114882283A (en) Sample image generation method, deep learning model training method and device
CN114091451A (en) Text classification method, device, equipment and storage medium
EP3959652B1 (en) Object discovery in images through categorizing object parts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant