CN116187310A - Document-level relation extraction method, device, equipment and storage medium - Google Patents

Document-level relation extraction method, device, equipment and storage medium Download PDF

Info

Publication number
CN116187310A
CN116187310A CN202211093370.5A CN202211093370A CN116187310A CN 116187310 A CN116187310 A CN 116187310A CN 202211093370 A CN202211093370 A CN 202211093370A CN 116187310 A CN116187310 A CN 116187310A
Authority
CN
China
Prior art keywords
document
adjacency matrix
initial vector
words
adjacent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211093370.5A
Other languages
Chinese (zh)
Inventor
周越
杨洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Pudong Development Bank Co Ltd
Original Assignee
Shanghai Pudong Development Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Pudong Development Bank Co Ltd filed Critical Shanghai Pudong Development Bank Co Ltd
Priority to CN202211093370.5A priority Critical patent/CN116187310A/en
Publication of CN116187310A publication Critical patent/CN116187310A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Machine Translation (AREA)
  • Character Input (AREA)

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for extracting a document-level relation, wherein the method comprises the following steps: constructing an initial vector based on words in the document; analyzing the document, and constructing an adjacency matrix corresponding to the initial vector based on the dependency relationship among words; and inputting the initial vector and the adjacency matrix into a graph convolution neural network model added with a multi-head attention mechanism to obtain a document-level relation extraction result. The technical scheme provided by the embodiment of the invention can improve the accuracy of the relation extraction result.

Description

Document-level relation extraction method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of natural language processing, in particular to a method, a device, equipment and a storage medium for extracting a document-level relation.
Background
The document relation extraction is used as a form of information extraction, aims to rapidly and accurately detect the relation among entities from massive information, can provide a basis for various researches, and has practical significance in improving the efficiency and accuracy of relation extraction.
In the related art, certain defects exist in the method for extracting the document-level relation, and the accuracy of a relation extraction result is affected.
Disclosure of Invention
The embodiment of the invention provides a method, a device, equipment and a storage medium for extracting a document-level relation, which can improve the accuracy of a relation extraction result.
In a first aspect, an embodiment of the present invention provides a document-level relationship extraction method, including:
constructing an initial vector based on words in the document;
analyzing the document, and constructing an adjacency matrix corresponding to the initial vector based on the dependency relationship among words;
and inputting the initial vector and the adjacency matrix into a graph convolution neural network model added with a multi-head attention mechanism to obtain a document-level relation extraction result.
In a second aspect, an embodiment of the present invention provides a document-level relationship extraction apparatus, including:
a first construction module for constructing an initial vector based on words in the document;
the second construction module is used for carrying out syntactic analysis on the document and constructing an adjacency matrix corresponding to the initial vector based on the dependency relationship among words;
and the relation extraction module is used for inputting the initial vector and the adjacency matrix into a graph convolution neural network model added with a multi-head attention mechanism to obtain a document-level relation extraction result.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the methods provided by the embodiments of the present invention.
In a fourth aspect, embodiments of the present invention provide a computer readable storage medium storing computer instructions for causing a processor to execute a method provided by embodiments of the present invention.
According to the technical scheme provided by the embodiment of the invention, the initial vector is constructed through the words in the document, the adjacency matrix is constructed through the dependency relationship among the words, the initial vector and the adjacency matrix are input into the graph convolution neural network model added with the multi-head attention mechanism, the document-level relationship extraction result is obtained, and the accuracy of the relationship extraction result can be improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for extracting document-level relationships provided by an embodiment of the invention;
FIG. 2 is a flow chart of a method for extracting a document-level relationship according to an embodiment of the present invention;
FIG. 3 is a block diagram of a document level relationship extraction apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Two common document-level relation extraction methods exist in the related art, including document-level relation extraction based on a sequence model and document-level relation extraction based on a dependency relation. The method comprises the steps of extracting a document level relation based on a sequence model, operating a word sequence only, extracting sentence level characteristics through the sequence model by taking the whole document as a context, acquiring sentence representation through a pooling layer and a nonlinear layer, and finally judging the relation between entities through combining a full connection layer with softmax. The scheme has the defects that the sequence model has better extraction capacity for local information, but in a document level relation extraction task, the dependency relation among sentences is ignored, the structure information of entities and relations cannot be fully utilized, the extracted relation among the entities has no evidence sentence with strong support, and the interpretation is poor.
Fig. 1 is a flowchart of a document-level relationship extraction method provided by an embodiment of the present invention, where the method may be performed by a document-level relationship extraction apparatus, where the apparatus may be implemented by software and/or hardware, where the apparatus may be configured in an electronic device such as a terminal or a server, and where the method may be applied to relationship extraction between entities in different sentences in a document.
As shown in fig. 1, the technical solution provided by the embodiment of the present invention includes:
s110: an initial vector is constructed based on words in the document.
In one implementation of the embodiment of the present invention, the constructing an initial vector based on words in a document includes: and encoding words in the document to obtain encoding information, and constructing an initial vector based on the encoding information. Specifically, words may be extracted from a document, each word may be encoded, a specific encoding form may be encoded in a binary form, so as to obtain encoded information, an initial vector may be formed based on the encoded information, and the encoded information of each word may be specifically represented by the initial vector in a certain order.
S120: analyzing the document, and constructing an adjacency matrix corresponding to the initial vector based on the dependency relationship among words.
In an embodiment of the invention, each two words may have a dependency relationship between them, from which an adjacency matrix characterizing the dependency relationship between the words may be constructed. In one embodiment, optionally, the constructing the adjacency matrix corresponding to the initial vector based on the dependency relationship between words includes:
if the ith word is the dependent object of the jth word, then A ij =n2, otherwise, A ij =n1; wherein A is ij Is the value of the ith row and jth column in adjacency matrix a;
in the adjacency matrix, if i < N, j=i-1, a ij =n2;
In the adjacency matrix, if i < N, A ii =n1;
If the co-fingering relation is satisfied between the ith word and the jth word, A ij =n2,A ji =n2;
Setting the adjacency value of all words in the non-evidence sentence as n1 in the adjacency matrix; wherein n1 represents non-adjacent and n2 represents adjacent;
in the adjacency matrix, setting the position corresponding to the dependency relationship between nodes in adjacent sentence edges as 1; the adjacent sentence edges comprise edges for sequencing the supporting evidence sentences and connecting root nodes of adjacent evidence sentences; wherein n1 represents non-adjacent and n2 represents adjacent; the evidence sentence is characterized by a document dependency tree. Where n1 may be 0 and n2 may be 1.
In this embodiment, optionally, syntactic dependency edges may be introduced, syntactic analysis using StanfordcORNLP, i.e., A if the ith word is a dependent object of the jth word ij =1, otherwise a ij =0。
In this embodiment, alternatively, the adjacent word edges and the self-reverse edge may be introduced, specifically, the order of the words in the sentence is maintained by the introduction of the adjacent word edges, i.e., for any i < N, j=i-1, let a ij =1. Because the dependency object of a word will not be the word itself at the time of dependency analysis, for any i < N, A ii =0. To avoid loss of node's own information during propagation, each node needs to be added with an edge pointing to itself.
In this embodiment, alternatively, a common finger edge may be introduced. In particular, common reference indicates that words that appear in different locations and that the expressions are not exactly identical refer to the same entity. For example, "AAAA stands in 1992, is a stock-making commercial bank. AA constantly improves traffic energy and quality levels. In this sentence, "AAAA" and "AA" both represent the same bank, but the expressions of both differ considerably. Without introducing external knowledge, it is difficult to establish the relationship between them only through semantic understanding, and therefore, the co-fingered relationship plays an irreplaceable role in the establishment of non-local dependency relationship. The co-index relationships are mutually exclusive, so that the co-index edges are also bi-directional, if the ithIf the co-fingering relation between the word and the j-th word is satisfied, A ij =1,A ji =1。
In the embodiment of the invention, optionally, under the condition of constructing the adjacency matrix, the non-evidence sentence can be filtered. Specifically, the non-evidence sentence can be filtered by a method in the related technology, noise information is filtered, the calculation complexity of the model is reduced, and the calculation amount of the model is reduced. Wherein adjacency values for all words in all non-evidence sentences can be reset to zero. Wherein the understanding of the evidence sentence can refer to the related technology, for example, the evidence sentence can be a sentence related to the entity in the entity pair. Therefore, through filtering the non-evidence sentences, the calculation time can be saved, and the key information can be effectively reserved.
In the embodiment of the invention, optionally, the supporting evidence sentences are ordered in sequence, and the root nodes of adjacent supporting evidence sentences are connected by one edge, namely the adjacent sentence edges. In the adjacency matrix, the position corresponding to the dependency relationship among the nodes in the adjacency sentence edge is set to be 1, wherein the sentence can be represented by a document dependency tree, a specific method can refer to a related technology, and the root node of the document dependency tree can be a subject or other subjects in the sentence.
S130: and inputting the initial vector and the adjacency matrix into a graph convolution neural network model added with a multi-head attention mechanism to obtain a document-level relation extraction result.
In an embodiment of the present invention, optionally, the input to the graph roll-up neural network model that adds the multi-headed attention mechanism may be a binary group, which may be composed of an initial vector and an adjacency matrix. Wherein the tuple can pass through G =<X,A>To express, G represents a graph containing N nodes in total, X εR N*d Representing the network node initial vector, d is the dimension of the vector. A represents an adjacency matrix. Wherein the graph is directional, so that the adjacency matrix A is not a symmetric matrix, A ij Indicating that there is a directed edge from the ith node to the jth node. Wherein a node may be a word.
In one implementation of the embodiment of the present invention, optionally, inputting the initial vector and the adjacency matrix into a graph-convolution neural network model of an add-attention mechanism, to obtain a relationship between two target words, includes: inputting the initial vector and the adjacency matrix to a graph convolution module added with a multi-head attention mechanism to obtain a plurality of groups of output results; integrating the multiple groups of output results through a linear combination layer to obtain an integrated output result; and inputting the integration output result to a full-connection layer, and outputting the relation between the target entity pairs.
The graph convolution neural network model comprises a plurality of graph convolution modules, the graph convolution modules comprise a plurality of graph convolution layers, initial vectors and adjacent matrixes are input to the graph convolution modules with a multi-head attention mechanism, and the plurality of graph convolution modules output corresponding output results to obtain a plurality of groups of output results.
In an embodiment of the present invention, a gallery stack may be defined as: inputting the state of all nodes in the network at the L layer
Figure SMS_1
And adjacency matrix A, outputting the updated state of the network node +.>
Figure SMS_2
Where N represents the number of network nodes, A represents the adjacency matrix, d l And d l+1 The vector dimensions of the node at the L-th and L+1-th layers, respectively. The update process of the network node can be understood as a process that the node gathers the information of the neighbor nodes and merges the information with the information of the node itself, as shown in the following formula.
Figure SMS_3
Wherein, GCN is a graph roll-up neural network model, N (i) represents a neighbor node set of an ith node, f represents an activation function, and g represents an aggregation function.
In the embodiment of the invention, different heads can pay attention to different features and extract more abundant information by applying a multi-head attention mechanism to a graph convolution module, namely, a multi-core in a focused graph convolution neural network model. On the other hand, the multi-head can reduce the scale of click operation, prevent gradient from disappearing, reduce the dimension of the attention matrix and reduce the quantity of parameters.
In the embodiment of the invention, the output results of the graph convolution modules are integrated through the linear combination layer to obtain an integrated output result, and specifically, the integrated output result can be integrated through the following formula:
h comb =W comb h out +b comb
wherein h is comb Is an integrated output result; h is a uut Representing the output result of the convolution module connected with a plurality of separate graphs, i.e. h uut =[h 1 ;h 2 ;…h M ]。W comb ∈R (d*N)*d Is a weight matrix of the layer, b comb Is the bias vector of the linear combination layer.
In one implementation manner of the embodiment of the present invention, optionally, the inputting the integrated output result to the full connection layer, outputting the relationship between the target entity pair includes: extracting the representing information respectively belonging to the head entity and the tail entity in the integrated output result, and respectively carrying out maximum pooling operation to obtain the head entity representing information and the tail entity representing information; and classifying the relationship between the head entity representation information and the tail entity representation information through a full connection layer to obtain the relationship between the head entity and the tail entity.
Specifically, through the linear combination layer, representations of all words (token) can be obtained, and relationships between pairs of target entities are predicted based on the token representations. First, a target entity is represented by the following formula:
h head =f(masked_select head (h comb ))
wherein h is head Representing the header entity, and extracting the representing information of the token belonging to the header entity from all the tokens in the integrated output result by the mask_select operation. f represents a max pooling operation for fusing vector representations of n token into one. And similarly, the representation information of the tail entity can be obtained. Head entityAnd the dimension mapped to the relation category is taken as final output through the full connection layer after the information of the representation of the tail entity is spliced, and the method is as follows:
logits=FC([h head ;h tail ])
wherein h is tail The tail entity is represented, and the type of each relation may not be unique, so the relation extraction can be regarded as a multi-label two-class problem, in the prediction, not only the class with the highest score is obtained, but also the prediction class with the score higher than the threshold is output as the relation between entity pairs based on the experimental set threshold.
According to the technical scheme provided by the embodiment of the invention, the initial vector is constructed through the words in the document, the adjacency matrix is constructed through the dependency relationship among the words, the initial vector and the adjacency matrix are input into the graph convolution neural network model added with the multi-head attention mechanism, the document-level relationship extraction result is obtained, and the accuracy of the relationship extraction result can be improved.
FIG. 2 is a flowchart of a document level relation extraction method according to an embodiment of the present invention, in which the relevant steps are optimized on the basis of the above embodiment, and optionally, the initial vector and the adjacency matrix are input into a graph convolution neural network model with an attention adding mechanism to obtain a relation between two target words, where the method includes:
inputting the initial vector and the adjacency matrix to a graph convolution module added with a multi-head attention mechanism to obtain a plurality of groups of output results;
integrating the multiple groups of output results through a linear combination layer to obtain an integrated output result;
and inputting the integration output result to a full-connection layer, and outputting the relation between the target entity pairs.
Optionally, the inputting the initial vector and the adjacency matrix to a graph convolution module adding a multi-head attention mechanism to obtain multiple groups of output results includes:
simplifying the adjacent matrix based on a document dependency tree, and adding a multi-head attention mechanism to the simplified adjacent matrix to obtain a plurality of weight matrixes;
and respectively carrying out graph convolution operation on the initial vector and each weight matrix to obtain a corresponding output result.
The simplifying the adjacency matrix based on the document dependency tree comprises the following steps:
pruning the non-evidence sentence through the document dependency tree;
in the adjacency matrix, setting the position corresponding to the dependency relationship among words in the pruned non-evidence sentence as n1; wherein n1 represents non-adjacent.
As shown in fig. 2, the technical solution provided by the embodiment of the present invention includes:
s210: an initial vector is constructed based on words in the document.
S220: analyzing the document, and constructing an adjacency matrix corresponding to the initial vector based on the dependency relationship among words.
S230: simplifying the adjacent matrix based on the document dependency tree, and adding a multi-head attention mechanism to the simplified adjacent matrix to obtain a plurality of weight matrixes.
In an embodiment of the present invention, optionally, the simplifying the adjacency matrix based on the document dependency tree includes: pruning the non-evidence sentence through the document dependency tree; in the adjacency matrix, setting the position corresponding to the dependency relationship among words in the pruned non-evidence sentence as n1; wherein n1 represents non-adjacent.
In the embodiment of the invention, optionally, n1 may be 0, and the document dependency tree may be used to filter the non-evidence sentence, where the method for filtering the document dependency tree may refer to the related technology, and the method for constructing the document dependency tree may refer to the related technology. In this embodiment, in the adjacency matrix, the position corresponding to the dependency relationship between the words in the pruned non-evidence sentence is set to 0, so that the adjacency matrix can be simplified. The method can prune the document dependency tree through the dependency relationship in the related technology, can retain important information while achieving the minimum scale, has long time consumption and labor consumption caused by overlarge and too sparse document dependency tree in the model training process, and effectively improves the condition of key information loss caused by inaccuracy of a method based on regular pruning. The effective relation information can be effectively reserved by filtering the non-evidence sentences and pruning the adjacent matrix through the dependency relation.
In the embodiment of the invention, a multi-head attention mechanism is added to a simplified adjacent matrix to obtain a plurality of weight matrixes
Figure SMS_4
Wherein the weight matrix->
Figure SMS_5
The value of the matrix element can be any number between 0 and 1, wherein +.>
Figure SMS_6
Representing the weight of the edge from the ith node to the jth node.
S240: and respectively carrying out graph convolution operation on the initial vector and each weight matrix to obtain a corresponding output result.
In the embodiment of the invention, the initial vector and each weight matrix are respectively subjected to graph convolution operation to obtain corresponding output results, so that a plurality of groups of output results are obtained.
In the embodiment of the invention, the multi-head attention mechanism is used for calculating the weight matrix, so that the model can jointly process information from different subspaces.
S250: and integrating the multiple groups of output results through a linear combination layer to obtain an integrated output result.
S260: and inputting the integration output result to a full-connection layer, and outputting the relation between the target entity pairs.
The description of other steps in the embodiments of the present invention may refer to the description of the foregoing embodiments, and will not be repeated.
Two common document-level relation extraction methods exist in the related art, including document-level relation extraction based on a sequence model and document-level relation extraction based on a dependency relation. The method comprises the steps of extracting a document level relation based on a sequence model, operating a word sequence only, extracting sentence level characteristics through the sequence model by taking the whole document as a context, acquiring sentence representation through a pooling layer and a nonlinear layer, and finally judging the relation between entities through combining a full connection layer with softmax. The scheme has the defects that the sequence model has better extraction capacity for local information, but in a document level relation extraction task, the dependency relation among sentences is ignored, the structure information of entities and relations cannot be fully utilized, the extracted relation among the entities has no evidence sentence with strong support, and the interpretation is poor.
The technology mainly merges the document dependency tree into a relation model, establishes the relation between sentences, establishes different graphs for different relations between sentences to carry out independent graph convolution operation, and then adds the results of the graphs to obtain the relation result of the last entity pair. The disadvantage of this approach is that the document-dependent trees tend to be relatively large in scale and sparse, and a lot of time and effort is spent in model training, while some pruning strategies by rules may result in loss of important information, ultimately affecting the prediction results.
The document-level relation extraction is more difficult than the sentence-level relation extraction, because the effect of syntactic analysis is greatly reduced when the text range is expanded from sentence to document. The subject of each sentence in the document is not the same or even may be irrelevant, and an increase in text length may also lead to a significant increase in analysis difficulty. Therefore, the common method is to firstly perform syntactic analysis on a single sentence, then introduce additional information such as co-index relationships, adjacency relationships and the like to perform connection construction between different sentences, and take the result as the result of the syntactic analysis of the document. To avoid the use of large-scale and relatively sparse document dependency trees, pruning of the document dependency tree is required. The rule-based pruning strategy may cause loss of important information, so that the invention can adopt the predictive evidence sentence in the related technology to prune the document dependent tree, and can retain the important information of the document while reaching the minimum dependent tree scale. According to the method, the neural network model of the graph rolling machine is selected to encode the document, and the influence of node characteristic blurring on a prediction result along with the increase of the number of layers of the graph rolling neural network caused by the fact that information of each node is transmitted to neighbor nodes in a non-differential mode can be solved by adding the multi-head attention mechanism in the graph rolling module, so that the prediction accuracy is improved. The multi-head attention mechanism is used for extracting more abundant information, reducing the scale and parameter quantity of dot product operation and preventing the occurrence of gradient vanishing phenomenon.
Fig. 3 is a block diagram of a document-level relationship extraction apparatus according to an embodiment of the present invention, where the apparatus includes: a first building module 310, a second building module 320, and a relationship extraction module 330.
A first construction module 310 for constructing an initial vector based on words in a document;
a second construction module 320, configured to perform syntactic analysis on the document, and construct an adjacency matrix corresponding to the initial vector based on the dependency relationship between words;
and the relation extraction module 330 is configured to input the initial vector and the adjacency matrix to a graph convolution neural network model with a multi-head attention mechanism added, so as to obtain a document-level relation extraction result.
Optionally, inputting the initial vector and the adjacency matrix into a graph convolution neural network model of an added attention mechanism to obtain a relationship between two target words, including:
inputting the initial vector and the adjacency matrix to a graph convolution module added with a multi-head attention mechanism to obtain a plurality of groups of output results;
integrating the multiple groups of output results through a linear combination layer to obtain an integrated output result;
and inputting the integration output result to a full-connection layer, and outputting the relation between the target entity pairs.
Optionally, the inputting the initial vector and the adjacency matrix to a graph convolution module adding a multi-head attention mechanism to obtain multiple groups of output results includes:
simplifying the adjacent matrix based on a document dependency tree, and adding a multi-head attention mechanism to the simplified adjacent matrix to obtain a plurality of weight matrixes;
and respectively carrying out graph convolution operation on the initial vector and each weight matrix to obtain a corresponding output result.
Optionally, the simplifying the adjacency matrix based on the document dependency tree includes:
pruning the non-evidence sentence through the document dependency tree;
in the adjacency matrix, setting the position corresponding to the dependency relationship among words in the pruned non-evidence sentence as n1; wherein n1 represents non-adjacent.
Optionally, the constructing the adjacency matrix corresponding to the initial vector based on the dependency relationship between words includes:
if the ith word is the dependent object of the jth word, then A ij =n2, otherwise, A ij =n1; wherein A is ij Is the value of the ith row and jth column in adjacency matrix a;
in the adjacency matrix, if i < N, j=i-1, a ij =n2;
In the adjacency matrix, if i < N, A ii =n1;
If the co-fingering relation is satisfied between the ith word and the jth word, A ij =n2,A ji =n2;
Setting the adjacency value of all words in the non-evidence sentence as n1 in the adjacency matrix; wherein n1 represents non-adjacent and n2 represents adjacent;
in the adjacency matrix, setting the position corresponding to the dependency relationship between nodes in adjacent sentence edges as n2; the adjacent sentence edges comprise edges for sequencing the supporting evidence sentences and connecting root nodes of adjacent supporting evidence sentences; wherein n1 represents non-adjacent and n2 represents adjacent; the evidence sentence is characterized by a document dependency tree.
Optionally, the constructing an initial vector based on words in the document includes:
and encoding words in the document to obtain encoding information, and constructing an initial vector based on the encoding information.
Optionally, the inputting the integration output result to the full connection layer, outputting the relationship between the target entity pair, includes:
extracting the representing information respectively belonging to the head entity and the tail entity in the integrated output result, and respectively carrying out maximum pooling operation to obtain the head entity representing information and the tail entity representing information;
and classifying the relationship between the head entity representation information and the tail entity representation information through a full connection layer to obtain the relationship between the head entity and the tail entity.
The device provided by the embodiment of the invention can execute the method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the method.
Fig. 4 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the document level relationship extraction method.
In some embodiments, the document-level relationship extraction method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the document level relation extracting method described above may be performed. Alternatively, in other embodiments, processor 11 may be configured to perform the document-level relationship extraction method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A document-level relationship extraction method, comprising:
constructing an initial vector based on words in the document;
analyzing the document, and constructing an adjacency matrix corresponding to the initial vector based on the dependency relationship among words;
and inputting the initial vector and the adjacency matrix into a graph convolution neural network model added with a multi-head attention mechanism to obtain a document-level relation extraction result.
2. The method of claim 1, wherein inputting the initial vector and the adjacency matrix into a graph roll-up neural network model of an add-on mechanism results in a relationship between two target words, comprising:
inputting the initial vector and the adjacency matrix to a graph convolution module added with a multi-head attention mechanism to obtain a plurality of groups of output results;
integrating the multiple groups of output results through a linear combination layer to obtain an integrated output result;
and inputting the integration output result to a full-connection layer, and outputting the relation between the target entity pairs.
3. The method of claim 2, wherein said inputting the initial vector and the adjacency matrix to a graph convolution module that adds a multi-headed attention mechanism results in a plurality of sets of output results, comprising:
simplifying the adjacent matrix based on a document dependency tree, and adding a multi-head attention mechanism to the simplified adjacent matrix to obtain a plurality of weight matrixes;
and respectively carrying out graph convolution operation on the initial vector and each weight matrix to obtain a corresponding output result.
4. A method according to claim 3, wherein said simplifying said adjacency matrix based on a document dependency tree comprises:
pruning the non-evidence sentence through the document dependency tree;
in the adjacency matrix, setting the position corresponding to the dependency relationship among words in the pruned non-evidence sentence as n1; wherein n1 represents non-adjacent.
5. The method of claim 1, wherein the constructing the adjacency matrix corresponding to the initial vector based on the dependencies between words comprises:
if the ith word is the dependent object of the jth word, then A ij =n2, otherwise, A ij =n1; wherein A is ij Is the value of the ith row and jth column in adjacency matrix a;
in the adjacency matrix, if i<N, j=i-1, then a ij =n2;
In the adjacency matrix, if i<N is then A ii =n1;
If the co-fingering relation is satisfied between the ith word and the jth word, A ij =n2,A ji =n2;
Setting the adjacency value of all words in the non-evidence sentence as n1 in the adjacency matrix; wherein n1 represents non-adjacent and n2 represents adjacent;
in the adjacency matrix, setting the position corresponding to the dependency relationship between nodes in adjacent sentence edges as n2; the adjacent sentence edges comprise edges for sequencing the supporting evidence sentences and connecting root nodes of adjacent supporting evidence sentences; wherein n1 represents non-adjacent and n2 represents adjacent; the evidence sentence is characterized by a document dependency tree.
6. The method of claim 1, wherein constructing the initial vector based on words in the document comprises:
and encoding words in the document to obtain encoding information, and constructing an initial vector based on the encoding information.
7. The method of claim 2, wherein inputting the integrated output result to a fully-connected layer, outputting a relationship between a target entity pair, comprises:
extracting the representing information respectively belonging to the head entity and the tail entity in the integrated output result, and respectively carrying out maximum pooling operation to obtain the head entity representing information and the tail entity representing information;
and classifying the relationship between the head entity representation information and the tail entity representation information through a full connection layer to obtain the relationship between the head entity and the tail entity.
8. A document-level relationship extraction apparatus, comprising:
a first construction module for constructing an initial vector based on words in the document;
the second construction module is used for carrying out syntactic analysis on the document and constructing an adjacency matrix corresponding to the initial vector based on the dependency relationship among words;
and the relation extraction module is used for inputting the initial vector and the adjacency matrix into a graph convolution neural network model added with a multi-head attention mechanism to obtain a document-level relation extraction result.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions for causing a processor to perform the method of any one of claims 1-7.
CN202211093370.5A 2022-09-08 2022-09-08 Document-level relation extraction method, device, equipment and storage medium Pending CN116187310A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211093370.5A CN116187310A (en) 2022-09-08 2022-09-08 Document-level relation extraction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211093370.5A CN116187310A (en) 2022-09-08 2022-09-08 Document-level relation extraction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116187310A true CN116187310A (en) 2023-05-30

Family

ID=86442950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211093370.5A Pending CN116187310A (en) 2022-09-08 2022-09-08 Document-level relation extraction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116187310A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116595992A (en) * 2023-07-19 2023-08-15 江西师范大学 Single-step extraction method for terms and types of binary groups and model thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116595992A (en) * 2023-07-19 2023-08-15 江西师范大学 Single-step extraction method for terms and types of binary groups and model thereof
CN116595992B (en) * 2023-07-19 2023-09-19 江西师范大学 Single-step extraction method for terms and types of binary groups and model thereof

Similar Documents

Publication Publication Date Title
CN113011155B (en) Method, apparatus, device and storage medium for text matching
CN112560985A (en) Neural network searching method and device and electronic equipment
CN113378573A (en) Content big data oriented small sample relation extraction method and device
CN110796485A (en) Method and device for improving prediction precision of prediction model
CN112560461A (en) News clue generation method and device, electronic equipment and storage medium
CN115796310A (en) Information recommendation method, information recommendation device, information recommendation model training device, information recommendation equipment and storage medium
CN116187310A (en) Document-level relation extraction method, device, equipment and storage medium
CN113033194B (en) Training method, device, equipment and storage medium for semantic representation graph model
CN117495548A (en) Risk early warning method, device, equipment and medium
CN115982654B (en) Node classification method and device based on self-supervision graph neural network
CN115641009B (en) Method and device for excavating competitors based on patent heterogeneous information network
CN116992151A (en) Online course recommendation method based on double-tower graph convolution neural network
CN114792097B (en) Method and device for determining prompt vector of pre-training model and electronic equipment
CN114997360B (en) Evolution parameter optimization method, system and storage medium of neural architecture search algorithm
CN113361621B (en) Method and device for training model
CN115456093A (en) High-performance graph clustering method based on attention-graph neural network
CN114817476A (en) Language model training method and device, electronic equipment and storage medium
CN113590774A (en) Event query method, device and storage medium
CN114330576A (en) Model processing method and device, and image recognition method and device
CN112784967A (en) Information processing method and device and electronic equipment
CN113361574A (en) Training method and device of data processing model, electronic equipment and storage medium
CN113408636B (en) Pre-training model acquisition method and device, electronic equipment and storage medium
CN117495571B (en) Data processing method and device, electronic equipment and storage medium
CN112560481B (en) Statement processing method, device and storage medium
CN113408592B (en) Feature point matching method, device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination