CN114118088A

CN114118088A - Document level entity relation extraction method and device based on hypergraph convolutional neural network

Info

Publication number: CN114118088A
Application number: CN202111241687.4A
Authority: CN
Inventors: 刘杰; 华浩宇; 金泰松
Original assignee: Xiamen University; Capital Normal University
Current assignee: Xiamen University; Capital Normal University
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2022-03-01

Abstract

The invention relates to a document level entity relation extraction method and a document level entity relation extraction device based on a hypergraph convolutional neural network, wherein the method comprises the following steps: aiming at the triple, acquiring a hypergraph structure corresponding to the triple by adopting an HG-GCNN model; aiming at the hypergraph structure, obtaining a vector of each word in the document t under each entity in the entity pair of the relation to be predicted; obtaining a first entity e based on the vector of each word in the document under each entity in the entity pair of the relation to be predicted₁And a second entity e₂The entity relationship of (1); the HG-GCNN model is pre-established and trained and comprises a node information construction layer, a hypergraph construction layer, a GCNN coding layer and reasoning judgmentA model of the layer; the node information construction layer and the hypergraph construction layer execute a process of acquiring a hypergraph structure, the GCNN coding layer executes a process of acquiring a vector of each word of a document under each entity, and the reasoning judgment layer executes a process of acquiring a prediction result.

Description

Document level entity relation extraction method and device based on hypergraph convolutional neural network

Technical Field

The invention relates to the technical field of artificial intelligence natural language processing, in particular to a document level entity relation extraction method and device based on a hypergraph convolutional neural network.

Background

With the popularization of artificial intelligence in the field of natural language processing, machine translation, chapter understanding, knowledge maps and the like become popular research subjects. Research in these areas has relied on an important subject: and extracting entity relations, namely mining relation information of entity pairs from the document. The important significance of entity relationship extraction is mainly expressed in that: the result of the entity relation extraction can be applied to automatically expanding and constructing the knowledge graph, and a knowledge base of retrieval information is provided for a user. The result of the entity relationship extraction can be used as a data source for constructing an automatic question-answering system, and meanwhile, theoretical support can be provided for machine translation and chapter understanding. The complexity of document-level entity relationship extraction is much greater than that of statement-level entity relationship extraction, and the main research difficulty faced at present is mainly represented in the following aspects: the diversity of language expression brings great difficulty to the identification of the entity in the document level entity relation extraction; in the extraction process of entity relationships at chapter level, the number of references is often more, how to relate the relationships among the references and how to avoid the noise influence among different references of a unified entity is one of the pain points of research. The complexity of the chapter structure brings great difficulty to the understanding of chapters in the document-level entity relationship extraction, and because of the complicated writing technique and expression mode of the article, the connection between the entities is often penetrated in the front and back of the article, and how to effectively express the chapter structure of the document by using the model is another pain point of research.

The traditional relational extraction model mainly comprises two types, namely a sequence-based relational extraction model: modeling the relation based on the sequence relation of the words; another class is dependency-based relational extraction models: entity relationships are modeled based on relationship dependencies.

The sequence-based relation extraction method extracts the features of different levels through networks of different levels, and finally realizes the relation classification between entity pairs by associating the features of different levels. Namely, different models are used for intra-sentence and inter-sentence relations, and the entity relation is predicted by combining local and global characteristic information. The existing method uses CNN to encode information in sentences, encodes information between integrated sentences through maximum entropy, and processes characteristics of different levels in a stepwise manner. However, the method splits the document structure, and how to solve the intra-sentence and inter-sentence relationship of document extraction by using a uniform model becomes a great difficulty for research.

In the prior art, the relationship between long texts is coded through LSTM, and the structure of the model is unified. The original text is regarded as a sequence, the structure of the original text is coded through a long-term and short-term memory network, and important information of sentences is extracted through an attention mechanism. Although the intra-sentence and inter-sentence connection is considered, the dependence between texts is not embedded in the intra-sentence and inter-sentence connection, certain limitation exists, parameters are too many, the problem of gradient disappearance often occurs during training, and the model effect is relatively poor.

Disclosure of Invention

Technical problem to be solved

In view of the above disadvantages and shortcomings of the prior art, the present invention provides a method and an apparatus for extracting document-level entity relationships based on a hypergraph convolutional neural network, which solves the technical problem in the prior art that the extraction of document-level entity relationships is not accurate because only a single document graph structure is considered and global information that may not be completed is not available.

(II) technical scheme

In order to achieve the purpose, the invention adopts the main technical scheme that:

in a first aspect, an embodiment of the present invention provides a document-level entity relationship extraction method based on a hypergraph convolutional neural network, including:

s1, aiming at the triple, obtaining a hypergraph structure corresponding to the triple by adopting an HG-GCNN model;

wherein the triplet includes: a document t;

in the document t to bePredicting a first entity e of a relationship₁And a second entity e₂；

Wherein the document t comprises n words;

at least 2 entities in the n words;

s2, aiming at the hypergraph structure, obtaining a vector of each word in the document t under each entity in the entity pair of the relation to be predicted;

s3, obtaining the first entity e based on the vector of each word in the document under each entity in the entity pair of the relation to be predicted₁And a second entity e₂The entity relationship of (1);

the HG-GCNN model is a pre-established and trained model comprising a node information construction layer, a hypergraph construction layer, a GCNN coding layer and an inference fault;

the node information construction layer and the hypergraph construction layer execute a process of acquiring a hypergraph structure, the GCNN coding layer executes a process of acquiring a vector of each word of a document under each entity, and the reasoning and judging layer executes a process of acquiring a prediction result.

Preferably, S1 specifically includes:

the node information construction layer of the HG-GCNN model is respectively away from the first entity e for words in each paragraph of the document t in the triple₁And a second entity e₂Obtaining node information of the graph structure;

and the hypergraph construction layer of the HG-GCNN model adopts a preset composition strategy to construct a hypergraph structure aiming at the document t.

Preferably, the first and second liquid crystal materials are,

the node information construction layer of the HG-GCNN model is respectively away from the first entity e for words in each paragraph of the document t in the triple₁And a second entity e₂The obtaining of the node information of the graph structure specifically includes:

aiming at any paragraph in a document t in a triple, judging whether the paragraph has a first entity e₁If not, determining each word in the paragraph toFirst entity e₁The degree of importance of (a) is 0; if yes, any word in the paragraph is respectively connected with all the first entities e in the paragraph₁Constructs a set of super edges, any word in the paragraph being for the first entity e₁Is that any word in the paragraph is respectively associated with all the first entities e in the paragraph₁The reciprocal sum of the distances between references of (1);

aiming at any paragraph in the document t in the triple, judging whether the paragraph has a second entity e₂If not, determining that each word in the paragraph is for a second entity e₂The degree of importance of (a) is 0; if yes, any word in the paragraph is respectively connected with all second entities e in the paragraph₂Constructs a set of super edges, any word in the paragraph being for a second entity e₂Is that any word in the paragraph is respectively associated with all the second entities e in the paragraph₂The reciprocal sum of the distances between references of (1);

respectively corresponding any word in any paragraph of the document t in the triple to the first entity e₁And a second entity e₂The degree of importance of (a) is used as information of a node corresponding to the word.

Preferably, the hypergraph structure layer of the HG-GCNN model adopts a preset composition strategy to construct a hypergraph structure, which specifically includes:

the hypergraph structure includes: grammar edges, reference edges, adjacent word edges, neighbor edges and reflexive edges;

the grammar edges are: a super edge connected with any word in each sentence in the document t and the word with grammar incidence relation;

the quote edge is: a set of super edges of the entities and their corresponding reference constructs in document t;

the adjacent word edge is as follows: each sentence in the document t, and the super edge constructed by the words in the sentence;

the adjacent sentence edges are as follows: each paragraph in the document t, a set of super edges of all entity constructs in the paragraph;

the self-reverse side is as follows: each word in the document t is associated with a super edge of the word;

in the hypergraph structure, a relation value between any hyperedge and a word which is associated with the hyperedge is set to be 1, and a relation value between any hyperedge and a word which is not associated with the hyperedge is set to be 0.

Preferably, S2 specifically includes:

the GCNN coding layer of the HG-GCNN model learns the hypergraph structure to acquire a word corresponding to each node in the hypergraph structure and respectively locates at a first entity e₁Vector representation of

And at the second entity e₂Vector representation of

Preferably, the GCNN coding layer of the HG-GCNN model learns the hypergraph structure to obtain a word corresponding to each node in the hypergraph structure in the first entity e₁Vector representation of

And vector representation under a second entity e2

The method specifically comprises the following steps:

generating a bias parameter and a super-edge weight aiming at the hypergraph structure by a GCNN coding layer of the HG-GCNN model;

learning the hypergraph structure by each layer of the GCNN coding layer of the HG-GCNN model, interacting the information of the neighbor node of each node in the hypergraph structure with the information of the node by using a formula (1) based on the hypergraph structure and the node information, iteratively updating the bias parameter and the super edge weight of the word corresponding to each node, generating new bias parameter and super edge weight, and finally obtaining the single edge weight corresponding to each node after passing through the k layer of the GCNN coding layerThe words being in the first entity e respectively₁Vector representation of

And at the second entity e₂Vector table of

The formula (1) is:

wherein,

is the ith word w in the hypergraph structure obtained from the kth layer of the GCNN coding layer_iComprising: word w_iRespectively at the first entity e₁Vector representation of

And at the second entity e₂Vector representation of

v (i) is the ith word w in the hypergraph structure_iThe neighbor node of (2); wherein, the neighbor nodes of the words are all nodes of the super edge connected with the words in series;

and

the weight and the bias parameters of the super edge of the k-1 layer with the super edge type l between the node i and the node u in the hypergraph structure;

and l is any type of super edge of grammar edge, reference edge, adjacent word edge, adjacent edge and reflexive edge.

Preferably, the first and second liquid crystal materials are,

the GCNN coding layer of the HG-GCNN model is based on the hypergraph structure, and the hypergraph value of each hypergraph in the hypergraph structure is also obtained;

the super-edge value is the product of the preset bias parameter plus the weight of the super-edge and the node information sum corresponding to each word which has an incidence relation with the super-edge.

Preferably, S3 specifically includes:

s31, obtaining a first entity e through the inference fault of the HG-GCNN model and a full connection layer₁And a second entity e₂Confidence score matrix S between₁₂；

The confidence score matrix S₁₂Each position in the table corresponds to a preset entity relationship type;

wherein,

S₁₂＝(Embedding¹R)Embedding²；

wherein, the R is a parameter matrix of the full connection layer after being pre-trained by adopting a classification method of multi-instance learning;

Embedding¹first entity e of all words in said hypergraph structure obtained by layer k of GCNN coding layer₁The lower vector represents the composed matrix;

Embedding²second entity e for all words in said hypergraph structure derived from layer k of GCNN coding layer₂The lower vector represents the composed matrix;

s32, according to the confidence score matrix S₁₂And the confidence score matrix S₁₂Determines the first entity e by the preset entity relationship type corresponding to each position in the first entity e₁And a second entity e₂The entity relationship of (1).

Preferably, the S32 specifically includes:

the confidence score matrix S₁₂The preset entity relationship type corresponding to the position of the maximum numerical value in all the positions is used as the first entity e₁And a second entity e₂Entity relationship of。

On the other hand, this embodiment further provides a document-level entity relationship extracting apparatus based on a hypergraph convolutional neural network, where the apparatus includes:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein the memory stores program instructions executable by the processor, and wherein the processor invokes the program instructions to perform a document-level entity relationship extraction method based on a hypergraph convolutional neural network as any of the above.

(III) advantageous effects

The invention has the beneficial effects that: compared with the prior art, the method and the device for extracting the document-level entity relationship based on the hypergraph convolutional neural network have the advantages that the HG-GCNN model is adopted, the hypergraph structure can be constructed according to the preset composition strategy, document-level hypergraphs of different levels on different granularity hypergraph edges are obtained, a GCNN coding layer in the HG-GCNN model can learn more information in the hypergraph structure, and the classification of the entity relationship is more accurate.

Drawings

FIG. 1 is a flowchart of a document level entity relationship extraction method based on a hypergraph convolutional neural network according to the present invention;

FIG. 2 is a schematic view of an overview of the HG-GCNN model of the present invention;

FIG. 3 is a node information structure diagram in the HG-GCNN model of the present invention;

FIG. 4 is a schematic diagram of a hypergraph structure in the HG-GCNN model of the present invention;

FIG. 5 is a schematic diagram of a GCNN coding layer in the HG-GCNN model according to the present invention;

FIG. 6 is a schematic diagram illustrating a variation of each evaluation index in a training set with training times during experimental verification according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a change of each evaluation index in the test set with the training times during experimental verification in the embodiment of the present invention.

Detailed Description

For the purpose of better explaining the present invention and to facilitate understanding, the present invention will be described in detail by way of specific embodiments with reference to the accompanying drawings.

In order to better understand the above technical solutions, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

Referring to fig. 1, the present embodiment provides a document-level entity relationship extraction method based on a hypergraph convolutional neural network, including:

and S1, aiming at the triple, acquiring the hypergraph structure corresponding to the triple by adopting an HG-GCNN model.

Wherein the triplet includes: and (7) a document t.

The first entity e of the relation to be predicted in the document t₁And a second entity e₂。

Where n words are included in the document t.

There are at least 2 entities in the n words.

In practical application of the embodiment, the input of the HG-GCNN model is a triplet (e)₁，e₂T), where t is a document, e1 and e2 are the entity pairs in the document t that need predicted relationships,

and outputting the words in the document t as entity relations.

S2, aiming at the hypergraph structure, obtaining the vector of each word in the document t under each entity in the entity pair of the relation to be predicted.

S3, obtaining the first entity e based on the vector of each word in the document under each entity in the entity pair of the relation to be predicted₁And a second entity e₂The entity relationship of (1).

Referring to fig. 2, the HG-GCNN model is a pre-established and trained model including a node information construction layer, a hypergraph construction layer, a GCNN coding layer, and an inference fault.

In practical application of this embodiment, S1 specifically includes:

referring to FIG. 3, the node information construction layer of the HG-GCNN model is respectively away from the first entity e for words in each paragraph of the document t in the triples₁And a second entity e₂And obtaining the node information of the graph structure.

Referring to fig. 4, the hypergraph construction layer of the HG-GCNN model constructs a hypergraph structure by using a preset composition strategy for a document t.

In the practical application of the embodiment, the node information construction layer of the HG-GCNN model is respectively away from the first entity e for words in each paragraph of the document t in the triple₁And a second entity e₂The obtaining of the node information of the graph structure specifically includes:

aiming at any paragraph in a document t in a triple, judging whether the paragraph has a first entity e₁If not, determining that each word in the paragraph is for the first entity e₁The degree of importance of (a) is 0; if yes, any word in the paragraph is respectively connected with all the first entities e in the paragraph₁Constructs a set of super edges, any word in the paragraph being for the first entity e₁Is that any word in the paragraph is respectively associated with all the first entities e in the paragraph₁The reciprocal sum of the distances between references.

Aiming at any paragraph in the document t in the triple, judging whether the paragraph has a second entity e₂If not, determining that each word in the paragraph is for a second entity e₂Is of importance to0; if yes, any word in the paragraph is respectively connected with all second entities e in the paragraph₂Constructs a set of super edges, any word in the paragraph being for a second entity e₂Is that any word in the paragraph is respectively associated with all the second entities e in the paragraph₂The reciprocal sum of the distances between references.

In a particular application, an entity has multiple textual representations, called references (indications), and here the embodiment is represented by a hypergraph G ═ here<V，E，W>To describe the word w₁，w₂，…，w_n]In connection with the cited associations, the following algorithm for constructing the importance of words is proposed, each time the construction is in units of paragraphs in a mini-batch, wherein

Means the word w_iFor the importance of entity j, the algorithm is described as follows:

inputting: triplet (e)₁，e₂T), where t is a document, e₁And e2 is the entity pair in document t that needs a predicted relationship,

words in the document t.

The method comprises the following steps:

if entity e₁In the absence of the reference of

else if entity e₂In the absence of the reference of

else

Will w_iEach with respect to e in all citations₁The reference of (a) constructs a set of super edges, e₂The reference of (a) constructs a set of super edges. Wherein w_iAs the central vertex.

Calculating the reciprocal sum of the distances between all vertices and the center vertex on each set of hyper-edges as the weight of the hyper-edge, w_iIn the associated super edge, e₁The weight of the super edge where the reference is located is

e₂The weight of the super edge where the reference is located is

And (3) outputting: each word node w_iOf (2) a

In practical application of this embodiment, referring to fig. 4, the hypergraph structure layer of the HG-GCNN model adopts a preset composition strategy to construct a hypergraph structure, which specifically includes:

the hypergraph structure includes: grammar edges, reference edges, neighbor edges, reflexive edges.

The grammar edges are: and (4) connecting any word and the word with grammatical association relation in each sentence in the document t.

In a specific application, the grammar edges: the grammatical relations existing in the sentences are important ways to understand the chapter structure of the articles, so that for each sentence of the articles, the text constructs the super-edge according to the grammatical relations of the sentence. For the current sentence, w will be described herein_iWords with grammatical associations are used to construct a set of super edges.

The quote edge is: a set of super edges constructed by the entities in document t and their corresponding references.

In a specific application, the reference edge: a reference to multiple occurrences of an entity in a chapter can be an important indicator of context before and after an article, and in grammars there are often timesWriting methods such as front-back treatment and summary score combination disperse the relationship between entities into different paragraphs, so that words referring to the same entity are constructed as a group of super-edges to connect the front-back relationship between texts. That is, for the current chapters, the chapters are characterized herein by the same referenced entity e_iAnd its corresponding reference is constructed as a set of super edges.

The adjacent word edge is as follows: each sentence in the document t, the super-edge constructed by the words in the sentence.

In a specific application, the neighboring word edge: in the sentences, both words and adjacent words have more or less connection and influence, either a main meaning or an action object relationship, although explicit analysis cannot be performed, adjacent words of each sentence are constructed into a group of super edges, and share the same super edge right to mine implicit information in the sentences. Selecting a current sentence, and determining the word w in the current sentence_iConfigured to construct a set of super edges.

The adjacent sentence edges are as follows: each paragraph in the document t, a set of super edges of all the entity constructs in the paragraph.

In a specific application, the adjacent sentence edge: in paragraphs, the expression of semantics is expressed by the arrangement and association between sentences in addition to the semantic manifestation of each sentence, and because the sentence semantics are expressed by having constructed the neighboring word super-edges within the sentence, the arrangement and association of sentences between paragraphs can be simplified to be characterized by the arrangement and association between each sentence entity. That is, for the current paragraph, the text will refer to the entity e of the adjacent sentence in the paragraph_iConstructed as a set of super edges.

The self-reverse side is as follows: each word in the document t is associated with its own super edge.

In a specific application, the self-reverse edge: in order to highlight the expression of the self-semantics of the words in the modeling process, all nodes construct a super edge containing the nodes in the composition process, namely all nodes w in the graph_iFormed from node type edges.

Different particle size of the edge contrast

	Grammar edge	Co-referenced edges	Adjacent word edge	Adjacent sentence edge	Self-reverse side
						Super edge particle size	Sentence	Document	Sentence	Paragraph (b)	Word
Node object	Word	Entity	Word	Entity	Word

Vertex transfinite relation matrix

	e₁	e₂	e₃	e₄
					w ₁	1	1	0	0
w ₂	0	0	1	1
					w ₃	0	1	0	1
w ₄	1	1	1	1

In the method and the device for extracting the document-level entity relationship based on the hypergraph convolutional neural network in the embodiment, because the HG-GCNN model is adopted, compared with the prior art, the hypergraph structure can be constructed according to a preset composition strategy, document-level hypergraphs of different levels on different granularity hypergraph edges are obtained, a GCNN coding layer in the HG-GCNN model can learn more information in the hypergraph structure, and the classification of the entity relationship is more accurate.

Referring to fig. 5, in practical application of this embodiment, S2 specifically includes:

And at the second entity e₂Vector representation of

In practical application of this embodiment, the GCNN encoding layer of the HG-GCNN model learns the hypergraph structure to obtain a word corresponding to each node in the hypergraph structure and the word is respectively located in the first entity e₁Vector representation of

And at the second entity e₂Vector representation of

The method specifically comprises the following steps:

and the GCNN coding layer of the HG-GCNN model generates a bias parameter and a super-edge weight aiming at the hypergraph structure.

Each layer of the GCNN coding layer of the HG-GCNN model learns the hypergraph structure and is based on the hypergraph structureAnd the node information, using a formula (1), interacting the information of the neighbor node of each node in the hypergraph structure with the information of the node, iteratively updating the bias parameters and the super edge weights of the words corresponding to each node, generating new bias parameters and super edge weights, and finally obtaining the words corresponding to each node after passing through the k layers of the GCNN coding layer and respectively in the first entity e₁Vector representation of

And at the second entity e₂Vector representation of

The formula (1) is:

wherein,

And at the second entity e₂Vector representation of

v (i) is the ith word w in the hypergraph structure_iThe neighbor node of (2); and the neighbor nodes of the words are all nodes of the super edge connected with the words in series.

And

the weight and the bias parameters of the super edge of the k-1 layer with the super edge type l between the node i and the node u in the super graph structure.

In practical application of this embodiment, the GCNN encoding layer of the HG-GCNN model further obtains a super edge value of each super edge in the super graph structure based on the super graph structure.

In practical application of this embodiment, S3 specifically includes:

s31, obtaining a first entity e through the inference fault of the HG-GCNN model and a full connection layer₁And a second entity e₂Confidence score matrix S between₁₂。

The confidence score matrix S₁₂Each position in (a) corresponds to a predetermined entity relationship type.

Wherein,

S₁₂＝(Embedding¹R)Embedding²。

and R is a parameter matrix of the full connection layer after pre-training by adopting a classification method of multi-instance learning.

Embedding¹First entity e of all words in said hypergraph structure obtained by layer k of GCNN coding layer₁The lower vector represents the composed matrix.

Embedding²Second entity e for all words in said hypergraph structure derived from layer k of GCNN coding layer₂The lower vector represents the composed matrix.

In a specific application, after vector representations of word nodes are acquired by using a hypergraph convolutional neural network, inference and fault judgment are needed to judge entity relationships, that is, the confidence of each relationship is calculated. To calculate the confidence of each relationship, a score matrix R is learned by a feedforward neural network (fully connected layer) to obtain a confidence score matrix between any two entities indexed by relationship.

In a specific application, an entity in a document may have multiple references, and a data set is generally marked at the level of the entity rather than the reference level, such as the text "the mr. from yangning and the man from weng know each other, and finally the correct result is obtained, and the couple is formed with the woman from weng fan. "in (1). Two triplets, namely Yangxing, teacher and student, Wengen sail and Yangxing, couple and Wengen sail, are relations which can be extracted from the text, but if the relation of teacher and student is desired to be extracted, the couple is noise data, and the noise caused by different entities exists in the long text widely. As is well known, the complexity of the relationship extraction in the text is caused by the complicated relationship in the text, and the present embodiment introduces a classification scheme based on multi-instance learning (MIL). In multi-instance learning, instead of treating the associated reference pairs in isolation, all reference pairs in an instance package are model trained, using the instance with the highest confidence to perform a single update of the model.

The multi-instance algorithm herein is described in detail as follows:

inputting: and training a set T.

The method comprises the following steps:

initializing parameters in a network, and filing a plurality of example packages, wherein each example package comprises a plurality of sentences, and the sentences comprise a plurality of reference pairs.

And sending the instance packet to the network learning in batches according to the size of the selected mini-batch, and training the model by using the set of all reference pairs of the instance packet.

And selecting the reference pair with the highest confidence degree in each example package to represent the current package through a score function, and updating the neural network once according to the example with the highest confidence degree.

The above process is repeated until the model converges or a maximum epoch is reached.

And (3) outputting: and finishing the trained model.

In specific application, after node information is obtained through calculation of a node information construction layer, two lexical item embedding representations aiming at different entities are obtained through construction of different granularity hyperedges of a hypergraph construction layer and coding of a hypergraph GCNN based on a neighborhood; and training a relationship discriminator in the inference and judgment layer, and expressing a training score matrix aiming at the vector of the lexical item to obtain a relationship confidence score of the entity pair. In the training process, a multi-instance learning scheme is adopted for solving the problem of noise between entity reference pairs.

In practical application of this embodiment, the S32 specifically includes:

the confidence score matrix S₁₂The preset entity relationship type corresponding to the position of the maximum numerical value in all the positions is used as the first entity e₁And a second entity e₂The entity relationship of (1).

Compared with the prior art, the method and the device for extracting the document-level entity relationship based on the hypergraph convolutional neural network have the advantages that the HG-GCNN model is adopted, the hypergraph structure can be constructed according to the preset composition strategy, document-level hypergraphs of different levels on different granularity hypergraph edges are obtained, a GCNN coding layer in the HG-GCNN model can learn more information in the hypergraph structure, and the classification of the entity relationship is more accurate.

A document-level entity relationship extraction device based on a hypergraph convolutional neural network, the device comprising:

at least one processor; and at least one memory communicatively coupled to the processor, wherein the memory stores program instructions executable by the processor, and the processor invokes the program instructions to perform a document-level entity relationship extraction method based on a hypergraph convolutional neural network as any of the above.

Experimental verification

In this embodiment, an experiment is performed on the document-level entity relationship extraction method based on the hypergraph convolutional neural network, where the experiment is as follows:

the experiment used pytorreh as the experimental environment, Adam as the optimizer for the model, and the learning rate was set to 0.002, the data set and the detailed partitioning herein were as follows:

CDR (Chemical-Disease Reactions dataset): the data set comprises 1500 data sets related to chemical and disease association corpora, and the data sets are used for carrying out relationship labeling on binary association between the chemicals and the diseases at a document level in a manual mode. 1500 data sets of related chemical disease association corpora were equally divided into three equally large portions for training, validation and testing.

GDA (Gene-Disease associates dataset): the data set comprises 30192 gene and disease association corpora, and the data set labels the binary association between the genes and the diseases at the document level through remote supervision. The 30192 gene disease association corpus dataset was divided into 29192 articles for training (training and validation sets divided by 4: 1) and 1,000 articles for testing in the experiment.

Results and analysis of the experiments

The relation extraction process is actually a relation multi-classification problem, for each class, the current class is a positive sample, and the other classes are negative samples, and on the premise that the number of positive case prediction correctness is TP, the number of negative case prediction errors is FP, the number of negative case prediction correctness is TN, and the number of positive case prediction errors is FN, the following four evaluation indexes are used in the text.

Accuracy (accuracuracy): the method is used for evaluating the percentage of the prediction which is correct to the whole prediction, and the higher the accuracy rate is, the better the model effect is. The calculation method is shown in formula (2).

Precision (precision): the method is also called precision ratio, and is used for evaluating the percentage of true examples in predicted positive examples and avoiding the limitation of accuracy index under the condition of unbalance of positive and negative samples. The higher the accuracy, the better the model effect. The calculation method is shown in formula (3).

Recall (recall): also known as recall, to assess how many positive examples in a sample were successfully predicted. The higher the recall rate, the better the model effect. The calculation method is shown in formula (4).

F₁The value: and under the condition that the importance of the default accuracy and the recall rate are the same, evaluating the indexes of the model, namely F, only when the accuracy rate and the recall rate are high₁The value will be high. The calculation method is shown in equation (5).

This example performs 10 epochs of training on the data set, and obtains the following results:

it can be seen that the HG-GCNN model in this embodiment works well for both data sets.

In this embodiment, the change conditions of each evaluation index of the HG-GCNN model on the CDR data training set and the test set under 20 epochs are tested, the convergence rate of the model is measured according to the change conditions of each evaluation index on the training set, and the generalization effect of the model is measured according to the change conditions of each evaluation index on the test set.

Referring to fig. 6, the convergence rate of the HG-GCNN model in the training process is very fast, and each training index basically tends to be stable after 10 epochs. Quite good results were obtained after 5 epochs. The adaptability of the training set data is better and better, loss is in a monotone decreasing state until the loss tends to be stable in the training process, and the accuracy rate and the recall rate are also in a monotone increasing state.

Referring to fig. 7, the HG-GCNN model is also excellent in generalization in the test set, and does not have overfitting, and basically, the model can converge to a stable state around 10 epochs, and the state of the training set tends to be synchronous. The accuracy and the recall rate of the model at the early stage of training are obviously different, the accuracy and the accuracy are close, and the recall rate is low, so that the model can quickly achieve the effect of accurately identifying the entity relationship in the training process, but the recall rate of the entity relationship needs to be further trained to be improved.

Comparative experiment

To show the superiority of the HG-GCNN model in this example, comparative experiments were performed on CDR datasets for the classical approach. The comparative algorithm comprises a traditional algorithm SVM model, an algorithm LSTM model based on a recurrent neural network, an algorithm CNN model based on a convolutional neural network and a Transformer model.

Using precision (P), recall (R) and F₁The values were used as evaluation indexes of the model, and the results of the experiment are shown below.

The HG-GCNN model in the embodiment has the best effect compared with a CNN model based on an attention mechanism and an LSTM model of a document graph modeled through a bidirectional tree structure, and for the CNN model, the recall rate (R) is greatly improved, about 12%, and the accuracy is improved to a certain extent; for the RNN model, there is a considerable improvement in accuracy, about 6%; the recall rate is improved to a certain extent, which shows that the introduction of hypergraphs with different granularities obviously improves the performance of the network.

Ablation experiment

In the ablation experiment, a Normal Graph Layer is constructed, the ablation experiment is performed on a Hypergraph structure Layer, and the result is shown in the following table, so that the accuracy and the recall rate of the Hypergraph structure Layer are improved to a great extent, including the improvement of the accuracy (P) of about 9% and the improvement of the recall rate (R) of about 6%.

In this embodiment, on one hand, the HG-GCNN model herein can achieve better effects on both CDR and GDA data sets on testing of different data sets; on the other hand, the HG-GCNN model has obvious improvement on the accuracy and the recall rate in comparison experiments with other models, and the improvement of the model caused by a hypergraph structure layer is obvious in ablation experiments based on the hypergraph structure layer. The fact proves that the model can extract effective information from a hypergraph structure based on sentences, paragraphs and document levels, meanwhile, the HG-GCNN model is high in convergence speed and about 10 epochs rapidly converge, and the fact proves that the introduction of the hypergraph can greatly simplify the document graph structure and improve the efficiency of model training.

Since the system described in the above embodiment of the present invention is a system used for implementing the method of the above embodiment of the present invention, a person skilled in the art can understand the specific structure and the modification of the system based on the method described in the above embodiment of the present invention, and thus the detailed description is omitted here. All systems adopted by the method of the above embodiments of the present invention are within the intended scope of the present invention.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions.

It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third and the like are for convenience only and do not denote any order. These words are to be understood as part of the name of the component.

Furthermore, it should be noted that in the description of the present specification, the description of the term "one embodiment", "some embodiments", "examples", "specific examples" or "some examples", etc., means that a specific feature, structure, material or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, the claims should be construed to include preferred embodiments and all changes and modifications that fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention should also include such modifications and variations.

Claims

1. A document level entity relation extraction method based on a hypergraph convolutional neural network is characterized by comprising the following steps:

wherein the triplet includes: a document t;

the first entity e of the relation to be predicted in the document t₁And a second entity e₂；

Wherein the document t comprises n words;

at least 2 entities in the n words;

2. The method according to claim 1, wherein S1 specifically comprises:

3. The method of claim 2,

aiming at any paragraph in a document t in a triple, judging whether the paragraph has a first entity e₁If not, determining that each word in the paragraph is for the first entity e₁The degree of importance of (a) is 0; if yes, any word in the paragraph is respectively connected with all the first entities e in the paragraph₁Constructs a set of super edges, any word in the paragraph being for the first entity e₁Is that any word in the paragraph is respectively associated with all the first entities e in the paragraph₁The reciprocal sum of the distances between references of (1);

4. The method according to claim 3, wherein the hypergraph construction layer of the HG-GCNN model adopts a preset composition strategy to construct a hypergraph structure, which specifically comprises:

5. The method according to claim 4, wherein S2 specifically comprises:

GCNN of HG-GCNN model is compiledA code layer for learning the hypergraph structure to acquire a word corresponding to each node in the hypergraph structure in a first entity e₁Vector representation of

And at the second entity e₂Vector representation of

6. The method of claim 5, wherein the HG-GCNN model is characterized in that a GCNN coding layer learns the hypergraph structure, and a word corresponding to each node in the hypergraph structure is obtained in the first entity e₁Vector representation of

And at the second entity e₂Vector representation of

The method specifically comprises the following steps:

learning the hypergraph structure by each layer of the GCNN coding layer of the HG-GCNN model, interacting the information of the neighbor node of each node in the hypergraph structure with the information of the node by using a formula (1) based on the hypergraph structure and the node information, iteratively updating the bias parameter and the super edge weight of the word corresponding to each node, generating new bias parameter and super edge weight, and finally obtaining the word corresponding to each node passing through the k layer of the GCNN coding layer and respectively corresponding to the first entity e₁Vector representation of

And at the second entity e₂Vector representation of

The formula (1) is:

wherein,

And at the second entity e₂Vector representation of

and

7. The method of claim 6,

8. The method according to claim 7, wherein S3 specifically includes:

wherein,

s₁₂＝(Embedding¹ R)Embedding²；

9. The method according to claim 8, wherein the S32 specifically includes:

the confidence score matrix S₁₂The preset entity relationship corresponding to the position where the maximum value is located in all the positionsType as the first entity e₁And a second entity e₂The entity relationship of (1).

10. A document-level entity relationship extraction device based on a hypergraph convolutional neural network, the device comprising:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein the memory stores program instructions executable by the processor, and wherein the processor calls the program instructions to perform the hypergraph convolutional neural network-based document level entity relationship extraction method of any of claims 1 to 9.