CN115081392A

CN115081392A - Document level relation extraction method based on adjacency matrix and storage device

Info

Publication number: CN115081392A
Application number: CN202210602851.8A
Authority: CN
Inventors: 闾海荣; 王天亨; 李艳; 石顺中
Original assignee: Fuzhou Institute Of Data Technology Co ltd
Current assignee: Fuzhou Institute Of Data Technology Co ltd
Priority date: 2022-05-30
Filing date: 2022-05-30
Publication date: 2022-09-20

Abstract

The present application relates to the technical field of document level relationship extraction, and in particular, to a document level relationship extraction method and a storage device based on an adjacency matrix. The document level relation extraction method based on the adjacency matrix comprises the following steps: modeling the long text of the document level through a Transformer-XL model; modeling the entity pairs with the relations into a dependency tree respectively; generating an adjacency matrix with a certain relation entity pair according to the dependency tree; fusing the relation characteristics related to the target relation characteristics through weighting attention; and generating the probability of the corresponding relation of the entity pair according to the fused feature matrix. In the method, a transform-XL model is adopted to model a long text sequence in a document, so that the text among all segments has semantic relation, and no upper limit is provided for the length of the modeled text.

Description

Document level relation extraction method based on adjacency matrix and storage device

Technical Field

The present application relates to the technical field of document level relationship extraction, and in particular, to a document level relationship extraction method and a storage device based on an adjacency matrix.

Background

The document level relation extraction aims at extracting the relation between entity pairs in a section of document, and the document level relation extraction is used as an information extraction method and plays an important role in constructing a large-scale knowledge graph. Then the current relationship extraction is mainly sentence-level oriented relationship extraction, which aims to extract a certain relationship existing between entity pairs from a sentence. However, in real-world applications, most pairs of entities with certain relationships are in different sentences, which makes document-level relationship extraction a relatively more difficult task than sentence-level relationship extraction.

The existing document level relation extraction often has the following defects:

1. all the relation features in the attention domain or the convolution domain are fused into the target relation features, some relation features are closely connected with the target relation features, the relation features provide rich semantic relations for the target relation features, and the target relation features can be helped to carry out intra-sentence or inter-sentence reasoning; but also introduces much noise, such as entity-pair relationship features where there is no relationship in the domain, which as a kind of noise exists widely in the convolution domain or attention domain.

2. Because the length of the document generally exceeds the coding range of the BERT model, the prior method divides a long text sequence and then codes the long text sequence respectively, so that although the problem that the BERT cannot process the long text is solved, a semantic fault is formed, namely weak semantic relation exists between the segments after coding.

3. The two methods for fusing the characteristics of different entity pairs in the characteristic matrix belong to implicit fusion, have weak purpose, and complete the fusion of the related relation characteristics mainly by fusing elements in a large-range matrix. The method has large calculation cost and poor fusion effect. This method is an important factor that hinders the performance improvement of the model.

Disclosure of Invention

In view of the above problems, the present application provides a document level relationship extraction method based on adjacency matrix, so as to solve the technical problems mentioned in the background art. The specific technical scheme is as follows:

a document level relation extraction method based on an adjacency matrix comprises the following steps:

modeling the long text of the document level through a Transformer-XL model;

constructing an entity pair relation characteristic matrix;

modeling the entity pairs with the relations into a path dependency tree respectively;

generating an adjacency matrix between a certain relation entity pair according to the dependency tree;

calculating a visibility matrix from the adjacency matrix;

fusing the relation characteristics related to the relation characteristics of the target entity pair through a self-attention mechanism;

and calculating the probability of the corresponding relation of the entity pair according to the fused feature matrix.

Further, the calculating a visible matrix according to the adjacency matrix further includes:

repeating the step of calculating the n-order adjacency matrix by using the n-1-order matrix until two relation characteristics with 1 element in the n-order adjacency matrix meet a preset condition;

calculating a visible matrix V according to the previous n-order adjacency matrix:

V＝A+A ² +...+A ⁿ

wherein A represents a first order matrix, A ² Represents a second order matrix, A ⁿ Representing an n-order matrix, wherein the value of n is a natural number which is more than or equal to 2.

Further, the fusing the relationship features associated with the target relationship features through weighted attention further includes:

and determining different weights according to the difference of the step numbers among different relation characteristics, wherein the longer the step number is, the smaller the weight is.

Further, the root node of the dependency tree is a corresponding entity pair, and the first-layer node represents an entity pair relationship characteristic representation in the adjacency matrix, which is in direct connection with the entity pair in the horizontal direction and the vertical direction.

Further, the constructing a relationship feature matrix further includes:

and calculating all entity embedded representations in the document, and constructing a relation characteristic matrix according to the embedded representations.

In order to solve the technical problem, the storage device is further provided, and the specific technical scheme is as follows:

a storage device having stored therein a set of instructions for performing:

modeling the long text of the document level through a Transformer-XL model;

constructing an entity pair relation characteristic matrix;

calculating a visibility matrix from the adjacency matrix;

Further, the set of instructions is further operable to perform: said calculating a visibility matrix from said adjacency matrix further comprising:

V＝A+A ² +...+A ⁿ

Further, the set of instructions is further for performing:

the fusing the relationship features associated with the target relationship features through weighted attention further comprises:

and determining different weights according to different step numbers among different relation characteristics, wherein the longer the step number is, the smaller the weight is.

Further, the set of instructions is further for performing:

the constructing of the relationship feature matrix further comprises:

The invention has the beneficial effects that: a document level relation extraction method based on an adjacency matrix comprises the following steps: modeling the long text of the document level through a Transformer-XL model; constructing an entity pair relation characteristic matrix; modeling the entity pairs with the relations into a path dependency tree respectively; generating an adjacency matrix between a certain relation entity pair according to the dependency tree; calculating a visibility matrix from the adjacency matrix; fusing the relation characteristics related to the relation characteristics of the target entity pair through a self-attention mechanism; and calculating the probability of the corresponding relation of the entity pair according to the fused feature matrix. In the method, a transform-XL model is adopted to model a long text sequence in a document, so that the text among all segments has semantic relation, and no upper limit is provided for the length of the modeled text. Semantic faults caused by BERT modeling can be effectively avoided. And the characteristics of different steps are captured by adopting an adjacency matrix method, the objective modeling is clear, and only the entity pairs with certain relation are modeled and the entity pairs without relation are discarded, so that the introduction of noise can be avoided, and the performance of the model is influenced. Only the entity pairs with certain relations are modeled, so that the calculation complexity can be effectively reduced, and the training and reasoning speed of the model is improved.

The above description of the present invention is only an outline of the present invention, and in order to make the technical solution of the present invention more clearly understood by those skilled in the art, the present invention may be implemented based on the content described in the text and drawings of the present specification, and in order to make the above object, other objects, features, and advantages of the present invention more easily understood, the following description will be made in conjunction with the embodiments of the present application and the drawings.

Drawings

The drawings are only for purposes of illustrating the principles, implementations, applications, features, and effects of particular embodiments of the present application, as well as others related thereto, and are not to be construed as limiting the application.

In the drawings of the specification:

FIG. 1 is a schematic diagram of the different entities of the embodiment that Bartolomeo Altomonte and Altomonte belong to the unified entity;

FIG. 2 is a diagram illustrating a semantic segmentation task regarding relationship feature fusion according to an embodiment;

FIG. 3 is a diagram illustrating the use of stacked criss-cross attention modules to fuse information in an entity-pair relationship feature matrix according to an embodiment;

FIG. 4 is a flowchart illustrating a document level relationship extraction method based on adjacency matrices according to an embodiment;

FIG. 5 is a schematic diagram of a model applied to a document level relationship extraction method based on an adjacency matrix according to an embodiment;

fig. 6 is a block diagram of a storage device according to an embodiment.

The reference numerals referred to in the above figures are explained below:

600. a storage device.

Detailed Description

In order to explain in detail possible application scenarios, technical principles, practical embodiments, and the like of the present application, the following detailed description is given with reference to the accompanying drawings in conjunction with the listed embodiments. The embodiments described herein are merely for more clearly illustrating the technical solutions of the present application, and therefore, the embodiments are only used as examples, and the scope of the present application is not limited thereby.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase "an embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or related to other embodiments specifically defined. In principle, in the present application, the technical features mentioned in the embodiments can be combined in any manner to form a corresponding implementable technical solution as long as there is no technical contradiction or conflict.

Unless otherwise defined, technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the use of relational terms herein is intended only to describe particular embodiments and is not intended to limit the present application.

In the description of the present application, the term "and/or" is a expression for describing a logical relationship between objects, meaning that three relationships may exist, for example a and/or B, meaning: there are three cases of A, B, and both A and B. In addition, the character "/" herein generally indicates that the former and latter associated objects are in a logical relationship of "or".

In this application, terms such as "first" and "second" are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

Without further limitation, in this application, the use of "including," "comprising," "having," or other similar expressions in phrases and expressions of "including," "comprising," or "having," is intended to cover a non-exclusive inclusion, and such expressions do not exclude the presence of additional elements in a process, method, or article that includes the recited elements, such that a process, method, or article that includes a list of elements may include not only those elements but also other elements not expressly listed or inherent to such process, method, or article.

As is understood in the examination of the guidelines, the terms "greater than", "less than", "more than" and the like in this application are to be understood as excluding the number; the expressions "above", "below", "within" and the like are understood to include the present numbers. In addition, in the description of the embodiments of the present application, "a plurality" means two or more (including two), and expressions related to "a plurality" similar thereto are also understood, for example, "a plurality of groups", "a plurality of times", and the like, unless specifically defined otherwise.

Document level relationship extraction, as mentioned in the background, is a relatively more difficult task than sentence level relationship extraction.

The current common method is as follows:

first, a document is given

Where N represents the length of the document, a special symbol is first inserted before and after each entity mention in the document, thereby marking the location of the entity mention in the document. The document after marking is encoded using BERT as encoder, resulting in an embedded representation with context information:

H＝[h ₁ ，h ₂ ，…，h _H ]＝BERT([x ₁ ，x ₂ ，…，x _n ])

where n represents the length of the document after insertion of the special character.

The imbedding referring to each entity to the previous special identifier denotes the imbedding referred to as an entity. Since there may be multiple corresponding entity mentions for each entity in the document, different entity mentions may be used to generate the representation of the corresponding entity. For example, Bartolomeo altomone and altomone in fig. 1 are references to different entities belonging to a unified entity.

After obtaining embedding of each entity mention, generating an entity embedded representation corresponding to the entity mention by using logsumexp function:

wherein

Denotes the representation of the jth entity mention, h _e Representing entities refers to embedded representations of corresponding entities.

After all the entity embedded representations in the document are obtained, an entity-to-feature representation matrix M is generated based on these embedded representations. Firstly, combining document representation, local feature context information and entity embedded representation to obtain entity representation after semantic enrichment:

wherein W _s And W _o The parameters of the model are represented by,

and

an entity embedded representation representing a host and an object, respectively, h _doc Representing document embedded representations, c _s，o Representing local feature context information resulting from multiplying the topmost attention score matrix of the encoded subject and object BERT models by the word embedding representation in the document:

after obtaining the enhanced representation of the subject and the object, generating the entity pair relation feature vector through a Feed Forward Neural Network (FFNN):

M _s，o ＝FFNN([u _s ,u _o ])

the above operations on the entity pairs in the document can obtain an entity pair feature matrix M, in which the elements represent the representation of the corresponding relationship between the entity pairs, but there is no semantic relation between the entity pairs. In the document level relation prediction, information between entity pairs plays an important role in cross-sentence reasoning, so that the combination of characteristics in the entity pair relation characteristic demonstration can be helpful for improving the expression of relation extraction.

The currently common method for fusing entity-to-relationship features mainly comprises the following two methods:

1. the relation feature fusion is regarded as a semantic segmentation task (as shown in figure 2)

The semantic segmentation task is essentially to model each pixel point in an image to obtain pixel representation after the semantic is rich, then different types are distributed to each pixel point, the task type is similar to the document level relation extraction task, all the tasks are to model the pixel point in the matrix, and a label is distributed. And (4) regarding the relation characteristic matrix as an image matrix consisting of pixel points, and then coding the relation characteristic matrix by using U-Net. Specifically, the relation feature matrix is downsampled by adopting a convolutional layer and a pooling layer, and then upsampled by adopting the convolutional layer and a deconvolution layer, so that the relation feature matrix with rich semantics is obtained.

And (4) combining each element in the relation matrix after the semanteme is rich by adopting a linear layer and an activation function to obtain the probability of each relation of the entity pair.

2. A stacked cross attention module is employed to fuse information in the entity-pair relationship feature matrix (as shown in fig. 3).

The cross attention network (Criss-CrossAttentionNetwork) can capture the transverse and longitudinal information in the relation feature matrix by attention, thereby completing the operation of information fusion. However, the single-layer cross attention network can only merge the horizontal and vertical information into the target entity pair, which is not enough to support the relationship classification. Therefore, information in the whole matrix can be indirectly fused into the target entity pair by stacking a plurality of crossed networks, and the effect of semantic enhancement is achieved. The calculation process is as follows:

wherein N is _e Size of the relational feature matrix, A _{(s，o)→(s，i)} Represents from M _s，o To M _s，i Attention score of, A _{(s，o)→(i，o)} Represents from M _s，o To M _i，o The attention score of (1). The above operations are then repeated on each feature representation in the matrix to achieve feature fusion.

When obtaining a fused feature matrix M' _s，o Then, the entity representation information is fused with the entity representation information, and finally, the probability of the corresponding relation of the entity pair is generated:

where σ represents the activation function.

As mentioned in the background, the above method has three disadvantages in the background.

FIG. 4 is a flowchart illustrating a document level relationship extraction method based on an adjacency matrix according to the present application, which can be applied to a storage device including, but not limited to: personal computers, servers, general purpose computers, special purpose computers, network appliances, embedded appliances, programmable appliances, intelligent mobile terminals, etc. As shown in fig. 4, a document level relationship extraction method based on an adjacency matrix includes steps S401 to S407.

In step S401, long text at the document level is modeled by a transform-XL model. Therefore, the defect that the BERT cannot carry out long sequence modeling can be overcome, and the semantic continuity among long sequences is ensured. The encoding process of the Transformer-XL is as follows:

H＝[h ₁ ，h ₂ ，...，h _n ]＝transformer-xl([x ₁ ，x ₂ ，...，x _n ])

in step S402, an entity-pair relationship feature matrix is constructed. Specifically, all entity embedded representations in the document are calculated, and a relationship feature matrix is constructed according to the embedded representations. The specific operation process is the same as the above-mentioned existing method.

After the relational feature matrix M is generated, step S403 is performed to model the entity pairs with relations into a path dependency tree. The method specifically comprises the following steps: and for the entity pairs with certain relationships, modeling the entity pairs into a dependency tree respectively, wherein the root node of the tree is the entity pair, and the child nodes in the first layer represent the relationship characteristic representation of the entity pairs which are in direct connection with the longitudinal direction and the entity pairs in the adjacency matrix. By having a certain relationship is meant: for a given entity pair<e _s ,eo>And a predefined relationship type set R, if the relationship type described in the relationship type set R exists between the entity pairs, the entity pairs are called to have a certain relationship, otherwise, the entity pairs are called to have no relationship.

In step S404, an adjacency matrix between pairs of entities having a certain relationship is generated from the dependency tree.

After the adjacency matrix a is established, step S405 is executed: a visibility matrix is calculated from the adjacency matrix. It may specifically be: repeating the step of calculating the n-order adjacency matrix by using the n-1-order matrix until two relation characteristics with 1 element in the n-order adjacency matrix meet a preset condition;

V＝A+A ² +...+A ⁿ

Since an element of 1 in a represents the first order adjacency between two relational features, it is represented in the relational matrix as being visible in the landscape and portrait directions. Then, a second-order adjacency matrix A is calculated by using the first-order adjacency matrix ² The element of 1 in the second order matrix represents the second order adjacency between two relational features, so that the two-hop reasoning can be represented in the document. By analogy, the third-order adjacency matrix A ³ A three-hop inference can be represented. When an n-order adjacency matrix is calculated, it is considered that there is no obvious association between two relation features of which the elements in the subsequent adjacency matrix are not 1, so that the calculation is not performed, where no obvious association means that the association degree between the two relation features is smaller than a preset threshold, and the specific number of the preset threshold is defined by an actual application scenario, where the value of n needs to be predefined, and the specific number of the predefined value is determined according to the actual application scenario, and n is generally selected to be smaller than or equal to 5. From the first n-th order adjacency matrix, the visibility matrix V can be calculated:

V＝A+A ² +...+A ⁿ

in step S406, the relationship features associated with the target entity pair relationship features are fused through a self-attention mechanism. Since the element 1 in the visual matrix V represents all the relationship features related to the target relationship features within the n-hop reasoning, and these relationship features are considered to have a certain influence on the classification of the target relationship features, these features can be said to be fused by a method of weighted attention. Different weights are added in the feature fusion process with different step numbers because the step numbers of different relational features are different, and the longer the step number is, the smaller the weight is.

Wherein M' _s，o Representing the relational features after fusion, M _s，o Representing the relation characteristic before fusion, and respectively representing the weights corresponding to different step numbers, wherein alpha is more than beta and more than gamma. A. the ₁ Attention coefficient, M, representing the visibility of a first order matrix ₁ Representing a first order visible representation of the relational feature.

Generating a post-fusion relationship feature M' _s，o After that, step S407 is executed: and calculating the probability of the corresponding relation of the entity pair according to the fused feature matrix. The method used is the same as mentioned above:

as shown in FIG. 5, in the model diagram adopted by the method, a transform-XL model is adopted to model a long text sequence in a document, so that the text between segments has semantic relation, and no upper limit is provided for the length of the modeled text. Semantic faults caused by BERT modeling can be effectively avoided. And the characteristics of different steps are captured by adopting an adjacency matrix method, the objective modeling is clear, and only the entity pairs with certain relation are modeled and the entity pairs without relation are discarded, so that the introduction of noise can be avoided, and the performance of the model is influenced. Only the entity pairs with certain relations are modeled, so that the calculation complexity can be effectively reduced, and the training and reasoning speed of the model is improved.

Referring now to FIG. 6, a specific embodiment of a storage device 600 is illustrated:

a storage device 600 having stored therein a set of instructions for performing:

the long text at the document level is modeled by a Transformer-XL model. Therefore, the defect that the BERT cannot carry out long sequence modeling can be overcome, and the semantic continuity among long sequences is ensured. The encoding process of the Transformer-XL is as follows:

and constructing an entity pair relation characteristic matrix. Specifically, all entity embedded representations in the document are calculated, and a relationship feature matrix is constructed according to the embedded representations. The specific operation process is the same as the above-mentioned existing method.

After the relational feature matrix M is generated, the entity pairs with the relations are respectively modeled into a path dependency tree. The method specifically comprises the following steps: and for entity pairs with certain relationships, modeling the entity pairs as a dependency tree respectively, wherein the root node of the tree is the entity pair, and the child nodes in the first layer represent the relationship characteristic representation of the entity pair in the adjacency matrix, which is in direct connection with the entity pair in the horizontal direction and the vertical direction. By having a certain relationship is meant: for a given entity pair<e _s ,eo>And a predefined relationship type set R, if the relationship type described in the relationship type set R exists between the entity pairs, the entity pairs are called to have a certain relationship, otherwise, the entity pairs are called to have no relationship.

And after an adjacency matrix A is established by generating an adjacency matrix between entity pairs with certain relations according to the dependency tree, calculating a visible matrix according to the adjacency matrix. It can be specifically: repeating the step of calculating the n-order adjacency matrix by using the n-1-order matrix until two relation characteristics with 1 element in the n-order adjacency matrix meet a preset condition;

V＝A+A ² +...+A ⁿ

Since an element of 1 in a represents the first order adjacency between two relational features, it is represented in the relational matrix as being visible in the landscape and portrait directions. Then, a second-order adjacency matrix A is calculated by using the first-order adjacency matrix ² The element of 1 in the second order matrix represents the second order adjacency between two relational features, so that the two-hop reasoning can be represented in the document. By analogy, the third-order adjacency matrix A ³ A three-hop inference can be represented. When an n-order adjacency matrix is calculated, if it is considered that two relation features of which the elements are not 1 in the subsequent adjacency matrix do not have obvious relation, the calculation is not performed, and the lack of obvious relation means that the degree of relation between the two relation features is smaller than a preset threshold, and the specific number of the preset threshold is defined by an actual application scene, wherein the value of n needs to be predefined, and the specific number of the value is determined according to the actual application scene, and n is generally selected to be smaller than or equal to 5. From the first n-th order adjacency matrix, the visibility matrix V can be calculated:

V＝A+A ² +...+A ⁿ

and fusing the relation characteristics related to the relation characteristics of the target entity pair through a self-attention mechanism. Since the element 1 in the visible matrix V represents all the relationship features related to the target relationship features within the n-hop reasoning, and these relationship features are considered to have a certain influence on the classification of the target relationship features, these features can be described to be fused by a method of weighted attention. Different weights are added in the feature fusion process with different step numbers because the step numbers of different relational features are different, and the longer the step number is, the smaller the weight is.

WhereinM′ _s，o Representing the relational features after fusion, M _s，o Representing the relation characteristic before fusion, and respectively representing the weights corresponding to different step numbers, wherein alpha is more than beta and more than gamma. A. the ₁ Attention coefficient, M, representing the visibility of a first order matrix ₁ Representing a first order visible representation of the relational feature.

Generating a post-fusion relationship feature M' _s，o And then, calculating the probability of the corresponding relation of the entity pair according to the fused feature matrix. The method used is the same as mentioned above:

the commands executed by the instruction set of the storage device 600 model long text sequences in the documents by using a Transformer-XL model, so that the texts between the segments have semantic relation, and no upper limit is provided for the length of the modeled texts. Semantic faults caused by BERT modeling can be effectively avoided. And the characteristics of different steps are captured by adopting an adjacency matrix method, the objective modeling is clear, and only the entity pairs with certain relation are modeled and the entity pairs without relation are discarded, so that the introduction of noise can be avoided, and the performance of the model is influenced. Only the entity pairs with certain relations are modeled, so that the calculation complexity can be effectively reduced, and the training and reasoning speed of the model is improved.

Finally, it should be noted that, although the above embodiments have been described in the text and drawings of the present application, the scope of the patent protection of the present application is not limited thereby. All technical solutions which are generated by replacing or modifying the equivalent structure or the equivalent flow according to the contents described in the text and the drawings of the present application, and which are directly or indirectly implemented in other related technical fields, are included in the scope of protection of the present application.

Claims

1. A document level relation extraction method based on an adjacency matrix is characterized by comprising the following steps:

modeling the long text of the document level through a Transformer-XL model;

constructing an entity pair relation characteristic matrix;

calculating a visibility matrix from the adjacency matrix;

2. The method for extracting document level relationship based on adjacency matrix according to claim 1, wherein the calculating a visible matrix according to the adjacency matrix further comprises:

V＝A+A ² +...+A ⁿ

3. The method according to claim 1, wherein the fusing the relationship features associated with the target relationship features through weighted attention further comprises:

4. The method according to claim 1, wherein the root node of the dependency tree is a corresponding entity pair, and the first-level node represents an entity-pair relationship feature representation in the adjacency matrix, which is directly related to the entity pair in the horizontal and vertical directions.

5. The method for extracting document level relationship based on adjacency matrix according to claim 1, wherein the constructing of the relationship feature matrix further comprises:

6. A storage device having a set of instructions stored therein, the set of instructions being operable to perform:

modeling the long text of the document level through a Transformer-XL model;

constructing an entity pair relation characteristic matrix;

calculating a visibility matrix from the adjacency matrix;

7. The storage device of claim 6, wherein the set of instructions is further configured to perform: said calculating a visibility matrix from said adjacency matrix further comprising:

V＝A+A ² +...+A ⁿ

8. The storage device of claim 6, wherein the set of instructions is further configured to perform:

9. The storage device according to claim 6, wherein the root node of the dependency tree is a corresponding entity pair, and the first-level node represents an entity pair relationship characteristic representation in the adjacency matrix, which is directly related to the longitudinal direction and the entity pair.

10. The storage device of claim 6, wherein the set of instructions is further configured to perform:

the constructing of the relationship feature matrix further comprises: