CN110781683B

CN110781683B - Entity relation joint extraction method

Info

Publication number: CN110781683B
Application number: CN201911063750.2A
Authority: CN
Inventors: 冯钧; 杭婷婷; 李晓东; 陆佳民; 严乐; 朱跃龙
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2019-11-04
Filing date: 2019-11-04
Publication date: 2024-04-05
Anticipated expiration: 2039-11-04
Also published as: CN110781683A

Abstract

The invention discloses a method for jointly extracting entity relations based on multi-label labeling and a compound attention mechanism, which comprises the following steps: collecting corpus data for research, removing sentences with relation labels of None, and performing multi-label labeling on the rest sentences to form a training set; inputting the sentences marked by the multiple labels into a joint extraction model, and identifying entities and relations among the entities contained in the sentences through the joint extraction model to construct triples; and correcting the extracted triples by using the relation alignment model to adapt to multi-label labeling of the (head entity E1 and tail entity E2) entity pairs. The invention has the following effects: the method can effectively improve the accuracy of triplet extraction, and is an effective tool for extracting information aiming at unstructured data.

Description

Entity relation joint extraction method

Technical Field

The invention relates to the technical field of information extraction and natural language processing, in particular to a method for jointly extracting entity relations.

Background

With the rapid development of internet technology, the amount of data that people need to process increases rapidly, and how to extract entity and relationship information between entities from texts in these open fields rapidly and efficiently becomes an important problem that needs to be solved urgently. The entity relation extraction is a core task for extracting information aiming at unstructured data, and mainly aims to detect entities from texts and identify semantic relations among entity pairs at the same time, so that the method is widely applied to the aspects of knowledge graph construction, information retrieval, dialogue generation, question-answering systems and the like. At present, two frameworks of a pipeline method and a joint learning method are generally adopted for entity relation extraction. Traditional extraction tasks typically employ a "pipeline" approach to extract entities first and then identify relationships between the entities. The "pipelined" approach is very convenient to handle, but ignores the correlation of the two subtasks, creating a superposition of errors. Different from the 'pipeline' method, the joint extraction model can extract the relation between the entities from the text while extracting the entities, so as to avoid error accumulation caused by the pipeline method. However, the existing joint learning method still has the problems that the overlapping relation cannot be identified, the more abundant context information in the sentences cannot be learned, the extraction result is not corrected, and the like, so that the triad extraction accuracy is low. The main challenges at present are how to improve the accuracy of triplet extraction, and the solution of these problems depends largely on three aspects of quality of data annotation, performance of the model itself and correction of extraction results.

In the aspect of data annotation, the annotation granularity is gradually thinned, from an early ' IO ' annotation system, a ' BIO ' annotation system and a recently proposed ' BIOES ' annotation system, (wherein ' B ' indicates that the word is located at the initial position of an entity, ' I ' indicates that the word is located at the middle position of the entity, ' E ' indicates that the word is located at the tail end position of the entity, S ' indicates that the word is an entity, and ' O ' indicates that the word is a non-entity); the labeling systems described above contain entity information and relationship information between entities. Based on these labeling schemes, a joint extraction model is then used to implement the joint extraction task. However, most of the existing labeling methods are based on single-label labeling, and have some defects in the recognition of overlapping relations, and neglect the problem that one word has multiple labels and one word can appear in multiple triples.

Based on the data annotation, a joint extraction model needs to be overlapped to complete the joint extraction task. The joint extraction model existing at present is mostly feature-based, relies heavily on complex features, and makes it difficult to utilize global features. In order to automatically learn global features, an end-to-end model based on an encoding-decoding (Encoder-Decoder) framework is commonly adopted in the industry at present, and better experimental results are obtained on joint extraction tasks. However, this architecture has a problem in that the encoder uses a fixed window context vector for the internal representation, and does not obtain more rich context information, and thus has poor performance over long input or output sequences.

In terms of correction of the extraction result, if only the label of the overlapping relation is considered on the training set, recognition of the overlapping relation is not considered on the extraction result, and the extraction accuracy of the triples is reduced. For example, when data is marked, multiple relations exist between entity pairs, and if a result of only one single relation between entity pairs is predicted by the extraction model, prediction loss of the entity pairs in multiple relation classification can be brought.

In order to solve the above problems, a new extraction model needs to be provided to extract the entity and the relationship between the entities, which is a necessary measure in the information extraction field.

Disclosure of Invention

Aiming at the problems that the overlapping relation cannot be identified, the more abundant context information in sentences cannot be learned, the extraction result is not corrected and the like in the conventional joint learning method, the invention aims to provide the entity relation joint extraction method based on multi-label labeling and a compound attention mechanism, which can realize direct modeling of triples, avoid error accumulation caused by respectively extracting the entity and the relation between the entities and is an effective tool for information extraction and natural language processing.

In order to achieve the above object, the present invention is realized by the following technical scheme:

a method for entity relation joint extraction comprises the following steps:

performing multi-label labeling on corpus data to be processed;

inputting the sentences marked by the multiple labels into a joint extraction model, and identifying entities and relations among the entities contained in the sentences through the joint extraction model to construct triples;

and correcting the extracted triples by using the relation alignment model.

The entity relation joint extraction method includes: and collecting corpus data for research, removing sentences with the relation labels of None, and performing multi-label labeling on the rest sentences to form a training set.

The above-mentioned entity relation joint extraction method, wherein the corpus data includes complete information of sentences, sentence numbers, article numbers, mentioned relation in sentences, and entity information mentioned in sentences; the multi-label labeling refers to labeling each word in a sentence with one or more labels by using a three-segment labeling method.

The three-section labeling method is a prior art method and comprises the following steps: the characteristics of each tag are divided into three parts of relation type, belonged entity and position information:

relationship type: identifying a relationship type of the entity pair with a preset relationship tag, for example, "/peer/person/place_level" represents a residence relationship, and when there is a (person) entity pair and the relationship between the entity pairs is a residence relationship, the relationship between the entity pairs may be represented by "/peer/person/place_level";

the entity: with E1 representing the belonging to entity 1, E2 representing the belonging to entity 2, (in the present invention, E1 represents the head entity, E2 represents the tail entity);

position information: the position of each word in an entity is identified using "BMLO," where "B" indicates that the word is located at the beginning of the entity, "M" indicates that the word is located at the middle of the entity, "L" indicates that the word is located at the end of the entity, and "O" indicates that it is not an entity.

The entity relation joint extraction method comprises an embedding layer for mapping words in a high-dimensional discrete space to vectors in a low-dimensional continuous space, a Bi-directional long-short-term memory network Bi-LSTM coding layer for capturing semantic information of each word, a conditional random field CRF decoding layer for labeling a linear data sequence and a composite attention layer for comprehensively considering three characteristics of a calculation region, used information and structural hierarchy.

The method for constructing the triples specifically comprises the following steps: the joint extraction model performs word vector and position vector splicing on the labeled corpus data of the multiple labels at the embedding layer, and then inputs the spliced vectors to the neural network coding layer; the combined extraction model jointly learns the rich context information of the input token through the Bi-LSTM of the neural network coding layer and the composite attention mechanism of the neural network coding layer; the joint extraction model predicts the label of each word through CRF decoding of the decoding layer, and selects head entities and tail entities with relation through a composite attention mechanism of the decoding layer; finally, the triples are output through the output layer.

The above rich words are terms in the technical field of the present invention, and more feature information, such as word features, syntax features and semantic features, can be known.

The specific method for splicing the word vector and the position vector is as follows: numbering each position of the language data, enabling each number to correspond to a position vector P, and combining word vectors W of the position to obtain an input vector input= [ W, P ] of a final Bi-LSTM coding layer.

At the embedding layer, the input word is represented in two aspects, namely a word vector W, a word position vector P. The word vector W is obtained by using word2vec, and the word2vec model is used for converting words in natural language into dense vectors which can be understood by a computer. The position vector P of words determines the position of each word, or the distance between different words in the sequence, representing a local or even global structure. The invention realizes position coding by using sine and cosine functions, and not only can acquire absolute position information of words, but also can acquire relative position information. The absolute position information is achieved by the following formula:

PE(pos,2i)＝sin(pos/10000 ^2i/d _model ) (1)

PE(pos,2i+1)＝cos(pos/10000 ^2i/d _model ) (2)

the above formula maps the value of position code pos to a d _model Position vector of dimension. The even positions of this vector are sine coded, the values being PE (pos, 2 i); the odd positions of this vector are cosine coded and the value is PE (pos, 2i+1).

Through the formula, each position of the input sequence can be numbered, then each number corresponds to one vector P, and the word vector W of the position is combined to obtain the input vector input= [ W, P ] of the final coding layer. By repeating the above operation for each word in the input sequence, a dimension matrix (input_length) of a specific input sequence is returned (emb_dim) and input to the Bi-LSTM layer. Where input_length is the length of the input sequence and emm_dim represents the embedded dimension.

In the sequence labeling problem, the Bi-LSTM encoding layer can effectively capture semantic information of each word, and comprises a forward LSTM layer, a backward LSTM layer and a connection layer. For each word W _t The forward LSTM layer will take into account the slave word W ₁ To W _t Encoding W with context information of (a) _t Labeled asAnd so on, the backward LSTM layer will be based on the slave W _n To W _t Encoding W with context information of (a) _t Marked->Finally, by means of the connection->And->To represent W _t Is expressed as->Wherein the forward LSTM layer structure is as shown in FIG. 4, which can be described as +.>The backward LSTM layer can be described as +.>Wherein x is _t Input information indicating a certain moment, +.>Indicating the state of the forward cell and,indicating the backward cell state.

Further, the invention adopts a linear conditional random field CRF decoding layer which is suitable for labeling linear data sequences, the conditional random field CRF decoding layer takes the data sequence labeling problem as a probability distribution problem, P (y|x) is used for simulating probability distribution, wherein x represents an observation sequence, y represents a tag sequence and is used for labeling word boundaries in sentences, the two have the same chain structure and the same sequence length, and P (y|x) can be obtained by the following equation:

wherein Z (X) is a normalization factor, f _k As a feature function, representing the features of the output node of the observed sequence at the i and i-1 positions; g _k Features representing input-output nodes at i-position; lambda (lambda) _k And alpha _k The weight of the feature function is represented, k represents the feature, and i represents the position.

Further, the composite attention layer of the coding layer used in the invention comprehensively considers three parts of characteristics of the calculation area, the used information and the structural hierarchy.

From the point of view of the calculation region, the coding layer adopts Global Attention (Global Attention), and the decoding layer uses Local Attention (Local Attention). The two types of Attention differ: global Attention is used for solving weight probability for all keys, and each key has a corresponding weight, so that the Global computing mode is adopted. This way, the rationality is compared and the contents of all keys are referred to for weighting. However, this global calculation method may cause a relatively large calculation amount; local Attention is calculated for a window area, which is first located somewhere, and a window area can be obtained with this point as the center, and the Attention is calculated in this small area.

From the point of view of the information used, the whole framework of the joint extraction method adopts Self-Attention (Self-Attention), and the Self-Attention only considers text Self-information and does not consider additional information except the original text. The key feature of Self-attribute is that query=key=value, and only the input sequence is related. Each word in the sentence can be subjected to Attention calculation with other words, which is equivalent to searching for a link inside the sequence. Where query represents a query, key represents a key, and vaule represents a value.

The structure layer adopts a multi-layer Attention model; whether to divide the hierarchy may be decided according to the overall framework.

The multi-layer Attention model is calculated by the following formula:

MultiHead(Q,K,V)＝Concat(head ₁ ,...head _h )W ⁰ (5)

multi-head (Q, K, V) represents the result of a Multi-head splice, Q, K, V being the short for query, key and valid, respectively, head _i Representing the results of the computation of the different Attention sublayers, each head maintains an independent matrix of query weightsKey weight matrix->Sum weight matrix->Thereby generating different query matrices, key matrices and value matrices, each head being calculated by +.>Obtaining a result; l (L) _x Representing the sequence length of each sub-layer, a _i Is Value of _i Corresponding weight coefficient, sim _i Based on Query and a Key _i To calculate the similarity of the two, j is the sequence index of each sub-layer, sim _j Is the scaled dot product attention of the sequence index for each sub-layer.

Is a value of dimension K, sim of the coding layer and decoding layer of the present invention _i The calculation needs to be distinguished, and all keys of the coding layer participate in the calculation.

The similarity calculation formula of the coding layer is as follows:

and the decoding layer only participates in calculation of keys with consistent relation types through the multi-label labeling result. The similarity calculation formula of the decoding layer is as follows:

f (Tag) is a switching function used to distinguish between tags associated with and not associated with the target triplet. Tag is a relationship Tag for a target triplet. tag (i) is used to obtain a relationship tag for entity labeling at location i.

The entity relation joint extraction method utilizes the relation alignment model based on translation to form a relation set by similar relation, and corrects the extracted triples by utilizing the relation set.

The above-mentioned similarity relationship refers to two different relationship words having the same meaning as the meaning, such as words of two different relationships of good friends and friends, but the meaning is the same.

The specific method for forming the relation set by the entity relation joint extraction method comprises the following steps:

given a triplet (h, r ₁ T), requiring that h+r be satisfied between the extracted entity and relationship ₁ Translational constraint of approximately t, whose scoring function is expressed as

If another triplet (h, r ₂ T) whose extracted entities and relationships satisfy h+r ₂ Translational constraint of approximately t, whose scoring function is expressed as

From the two constraints above we can consider f (h, r ₁ ,t)≈f(h,r ₂ T), then r ₁ ≈r ₂ By analogy, the relationships satisfying the translation constraint form a relationship set r= { R ₁ ,r ₂ ,...r _n }；

Wherein h, r ₁ 、r ₂ …r _n The meaning of t is the head entity in the triplet, the relationship in the triplet and the tail entity in the triplet, respectively.

The specific method for correcting the extracted triples by using the relation set is as follows: if the relation classification prediction of the entity pair is one of the relation sets, the relation of the entity pair is represented by any one of the relation sets, so that the problem of the reduction of the triad extraction accuracy caused by the single relation classification prediction is avoided.

The beneficial effects of the invention are as follows:

(1) The method comprises the steps of decomposing a plurality of labels of a word into a plurality of independent data labeling problems by using the multi-label labeling to represent the plurality of labels of the word, so as to solve the problem of identifying the overlapping relationship;

(2) The problem of fixed window existing in the traditional encoding-decoding (Encoder-Decoder) architecture is solved by using an entity relation joint extraction method based on a composite attention mechanism, so that an Encoder learns more abundant context information; the decoder can learn the label information related to the current triples when decoding the triples, so that the influence of some invalid labels is eliminated, and the extraction efficiency of the effective triples is improved;

(3) In order to reduce the prediction loss of (head entity, tail entity) (E1, E2) entity pair in multi-relation classification, the invention also adds a relation alignment function in the prediction result of relation classification, which is used for correcting the triplet extraction result so as to adapt to multi-label labeling of (E1, E2) entity pair.

Drawings

The invention is described in detail below with reference to the attached drawing figures and the detailed description:

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a diagram of an exemplary multi-label according to the present invention;

FIG. 3 is a diagram of a joint extraction model of the present invention;

FIG. 4 is a forward LSTM graph of the present invention;

fig. 5 is a multi-headed attention diagram of the present invention.

Detailed Description

The invention is further described in connection with the following detailed description, in order to make the technical means, the creation characteristics, the achievement of the purpose and the effect of the invention easy to understand.

As shown in fig. 1, the entity relationship joint extraction method of the present invention includes the following steps: corpus data of research is collected, sentences with relation labels of "None" are removed, for example, triples such as { "E1": "Minnesota", "E2": "Tim Pawlenty", "label": "None" }, where Minnesota is a place name, tim Pawlenty is a person name, label (label) indicates that the relation between the two entities is "None", and None indicates that the two entities have no relation. Performing multi-label labeling on the rest sentences to form a training set; inputting the sentences marked by the multiple labels into a joint extraction model, and identifying entities contained in the sentences and the relation among the entities through the joint extraction model to construct triples; the extracted triples are corrected by using the relation alignment to adapt to multi-label labeling of the (head entity E1 and tail entity E2) entity pairs.

The method comprises the following specific implementation steps:

step one: in the corpus analysis process, we find that the relation label with some triples is "None", in order to eliminate the influence of the "None" label on the training process, we first exclude the triples with the relation label of "None", and make multi-label labeling on the rest sentences.

For the example of multi-label labeling, as shown in FIG. 2, we find that four triples are contained in the input sentence:

{Bobby Fischer，/people/person/nationality，Iceland}，

{Bobby Fischer，/people/deceased_person/place_of_death，Reykjavik}，

{Iceland，/location/country/capital，Reykjavik}，

{Iceland，/location/location/contains，Reykjavik}。

wherein "/peer/person/location", "/peer/facejperson/place_of_desath", "/location/count/potential" and "/location/location/contacts" are predefined relationship types for the dataset. "Bobby Fischer", "Iceland" and "Reykjavik" are entities contained in triples. These entities have different semantics in different triples and therefore need to be labeled according to a multi-label labeling scheme. The "Bobby Fischer" has two different data labels, wherein the labels "/scope/person/location_e1b" and "/scope/facejperson/place_of_location_e1b" of "Bobby" correspond to "/scope/person/location_e1l" and "/scope/location_person/place_of_location_e1l", thereby realizing the function that one word has multiple labels and one sentence has multiple triples.

Step two: inputting sentences marked by multiple labels into a joint extraction model, wherein the joint extraction model comprises an embedded layer, a Bi-directional long-short-term memory network Bi-LSTM coding layer, a conditional random field CRF decoding layer and a composite attention layer, and the concrete steps are as follows:

as shown in FIG. 3, wherein P represents a position vector, W represents a word vector, bi-LSTM represents an encoding layer, and is collectively referred to as a two-way long-short-term memory network; h represents a hidden layer; CRF represents a decoding layer, collectively called conditional random field; t denotes a label, and a denotes an attention vector.

The joint extraction model performs word vector and position vector splicing on the corpus subjected to multi-label labeling at an embedding layer, and then inputs the spliced vectors to a neural network coding layer; the joint extraction model jointly learns the rich context information of the input token through a two-way long-short-term memory network (Bi-LSTM) of the coding layer and a composite attention mechanism of the coding layer; the joint extraction model is decoded by a Conditional Random Field (CRF) of a decoding layer, the label of each word is predicted, and a head entity and a tail entity with a relation are selected through a composite attention mechanism of the decoding layer; finally, the triples are output through the output layer.

(1) Embedding layer

For mapping discrete words in an instance into a continuous input embedding. At the embedding layer, the input word is represented in two aspects, namely a word vector W, a word position vector P. The word vector W is obtained by using word2vec, and the word2vec model is used for converting words in natural language into dense vectors which can be understood by a computer. The position vector P of words determines the position of each word, or the distance between different words in the sequence, representing a local or even global structure. The invention realizes position coding by using sine and cosine functions, and not only can acquire absolute position information of words, but also can acquire relative position information. The absolute position information is achieved by the following formula:

the above formula maps the value of position code pos to a d _model Position vector of dimension. The even positions of this vector are sine coded, the values being PE (pos, 2 i); the odd positions of this vector are cosine coded and the value is PE (pos, 2i+1). Through the formula, each position of the input sequence can be numbered, then each number corresponds to a vector P, and the word vector W of the position is combined to obtain the input vector input= [ W, P ] of the final coding layer]. By repeating the above operation for each word in the input sequence, a dimension matrix (input_length) of a specific input sequence is returned (emb_dim) and input to the Bi-LSTM layer. Where input_length is the length of the input sequence and emm_dim represents the embedded dimension.

(2) Bi-LSTM coding layer

In the sequence labeling problem, the Bi-LSTM encoding layer can effectively capture semantic information of each word, and comprises a forward LSTM layer, a backward LSTM layer and a connection layer. For each word W _t The forward LSTM layer will pass through the examinationConsider word W ₁ To W _t Encoding W with context information of (a) _t Labeled asAnd so on, the backward LSTM layer will be based on the slave W _n To W _t Encoding W with context information of (a) _t Marked->Finally, by means of the connection->And->To represent W _t Is expressed as->Wherein the forward LSTM layer structure is as shown in FIG. 4, which can be described as +.>The backward LSTM layer can be described as +.>Wherein x is _t Input information indicating a certain moment, +.>Indicating forward cell status,/->Indicating the backward cell state.

(3) CRF decoding layer

The invention adopts a linear Conditional Random Field (CRF) and is suitable for labeling linear data sequences. The conditional random field CRF decoding layer uses the data sequence labeling problem as a probability distribution problem, and simulates the probability distribution by P (y|x), where x represents the observation sequence and y represents the tag sequence, for labeling word boundaries in sentences, both having the same chain structure and the same sequence length, and P (y|x) can be obtained by the following equation:

(4) Composite attention layer

The composite attention layer used in the invention comprehensively considers three characteristics of a calculation area, used information and structural hierarchy:

from the calculation region, the Attention (Attention) of the coding layer uses Global Attention (Global Attention), and the decoding layer uses Local Attention (Local Attention). The two types of Attention differ: global Attention is used for solving weight probability for all keys, and each key has a corresponding weight, so that the Global computing mode is adopted. This way, the rationality is compared and the contents of all keys are referred to for weighting. However, this global calculation method may cause a relatively large calculation amount; local Attention is calculated for a window area, which is first located somewhere, and a window area can be obtained with this point as the center, and the Attention is calculated in this small area.

From the information point of view used, the overall framework of the present invention employs Self-Attention (Self-Attention). Self-Attention only considers text Self information, and does not consider additional information except original text. The key feature of Self-attribute is that query=key=value, and only the input sequence is related. Each word in the sentence can be subjected to Attention calculation with other words, which is equivalent to searching for a link inside the sequence. Where query represents a query, key represents a key, and vaule represents a value.

The invention adopts a multi-layer Attention model in the aspect of structural hierarchy, and can decide whether to divide the hierarchy according to the whole framework. The coding layer Attention focuses on the characteristics of words in the current sentence, and the words in the sentence can be marked with proper labels according to the matching degree of the current words and the labels. The decoding layer Attention focuses on words with the same relation labels, can be positioned to a head entity and a tail entity according to the relation labels, and then performs triplet extraction, and can ignore words irrelevant to the current triples, so that the extraction performance of the triples is improved. A multi-head self-Attention mechanism is used in the coding layer and the decoding layer, a plurality of queries are used for carrying out multi-time Attention on a section of original text, each query pays Attention to different parts of the original text, and finally the results are spliced. A schematic diagram of multi-head Attention is shown in FIG. 5, wherein Q, K, V are short for query, key and variance, linear represents Linear transformation, h represents multiple heads, scaled Dot-product Attention represents scaling Dot product Attention, and Concat represents stitching of scaling Dot products for h times.

The multi-head Attention is mainly calculated by the following formulas:

MultiHead(Q,K,V)＝Concat(head ₁ ,...head _h )W ⁰ (5)

multi-head (Q, K, V) represents the result of a Multi-head splice. head part _i Representing the results of the calculations of the different Attention sublayers. Each head remains independentIs a query weight matrix of (1)Key weight matrix->Sum weight matrix->Thereby producing different query matrices, key matrices, and value matrices. The calculation of each head can be made of +.>The result is obtained. L (L) _x Representing the sequence length of each sub-layer. a, a _i Is Value of _i Corresponding weight coefficients. Sim (Sim) _i Based on Query and a Key _i To calculate the similarity of the two, j is the sequence index of each sub-layer, sim _j Is the scaled dot product attention of the sequence index for each sub-layer. />Is a value of dimension K. Sim of the coding layer and decoding layer of the present invention _i The calculation needs to be distinguished, and all keys of the coding layer participate in the calculation.

The similarity calculation formula of the coding layer is as follows:

Step three: and correcting the extracted triples by using relation alignment, and designing a translation-based relation alignment model inspired by a translation-based knowledge graph embedding model TransE model. Given a triplet (h, r ₁ T), requiring that h+r be satisfied between the extracted entity and relationship ₁ The translation constraint of t, whose scoring function can be expressed as

From the two constraints above we can consider f (h, r ₁ ,t)≈f(h,r ₂ T), then r ₁ ≈r ₂ Similarly, the constrained relationships form a set of relationships r= { R ₁ ,r ₂ ,...r _n (where h is the head entity in the triplet, r) ₁ ,r ₂ ,...r _n R is the tail entity in the triplet, which is the relationship in the triplet. And correcting the extraction result by using the relation set, and if the relation classification of the entity pair predicts one of the relation sets, the relation of the entity pair can be expressed as any one of the relation sets, so that the problem of the reduction of the extraction accuracy of the triples caused by single predicted relation is avoided.

The entity relationship joint extraction method of the invention uses multi-label labeling to represent a plurality of labels of a word, and decomposes the multi-label into a plurality of independent data labeling problems, so as to solve the recognition problem of overlapping relationship; the problem of fixed window existing in the traditional encoding-decoding (Encoder-Decoder) architecture is solved by using an entity relation joint extraction method based on a composite attention mechanism, so that an Encoder learns more abundant context information; the decoder can learn the label information related to the current triples when decoding the triples, so that the influence of some invalid labels is eliminated, and the extraction efficiency of the effective triples is improved; in order to reduce the prediction loss of (head entity, tail entity) (E1, E2) entity pair in multi-relation classification, a relation alignment function is added in the prediction result of relation classification, so as to correct the triplet extraction result to adapt to multi-label labeling of (E1, E2) entity pair. The invention has the following effects: the method can effectively improve the accuracy of triplet extraction, and is an effective tool for extracting information aiming at unstructured data.

The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. The present invention has been described in the art by way of illustration of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The entity relationship joint extraction method is characterized by comprising the following steps of:

performing multi-label labeling on corpus data to be processed;

the joint extraction model comprises an embedding layer for mapping words in a high-dimensional discrete space to vectors in a low-dimensional continuous space, a Bi-directional long-short-term memory network Bi-LSTM coding layer for capturing semantic information of each word, a conditional random field CRF decoding layer for labeling a linear data sequence and a composite attention layer for comprehensively considering three characteristics of a calculation region, used information and a structural layer;

the joint extraction model performs word vector and position vector splicing on the labeled corpus data of the multiple labels at the embedding layer, and then inputs the spliced vectors to the neural network coding layer; the joint extraction model jointly learns the rich context information of the input token through Bi-LSTM of the neural network coding layer and a composite attention mechanism of the neural network coding layer; the joint extraction model predicts the label of each word through CRF decoding of the decoding layer, and selects head entities and tail entities with relation through a composite attention mechanism of the decoding layer; finally, outputting the triples through an output layer;

correcting the extracted triples by using a relation alignment model;

forming a relation set by using the relation alignment model based on translation and similar relation, and correcting the extracted triples by using the relation set; the specific method for correcting the extracted triples by using the relation set is as follows: if the relationship classification of the entity pair predicts as one of the relationship sets, the relationship of the entity pair is represented by any one of the relationship sets;

the specific method for forming the relation set is as follows:

By the two constraints mentioned above, it is considered that f (h, r ₁ ,t)≈f(h,r ₂ T), then r ₁ ≈r ₂ By analogy, the relationships satisfying the translation constraint form a relationship set r= { R ₁ ,r ₂ ,...r _n }；

2. The method for entity-relationship joint extraction according to claim 1, wherein: the multi-label labeling of the corpus data to be processed comprises the following steps: and collecting corpus data for research, removing sentences with the relation labels of None, and performing multi-label labeling on the rest sentences to form a training set.

3. The method for entity-relationship joint extraction according to claim 1, wherein: the corpus data comprises complete information of sentences, sentence numbers, article numbers, mention relations in the sentences and entity information mentioned in the sentences; the multi-label labeling refers to labeling each word in a sentence with one or more labels by using a three-segment labeling method.

4. The method for entity-relationship joint extraction according to claim 1, wherein: the specific method for splicing the word vector and the position vector comprises the following steps: numbering each position of the language data, enabling each number to correspond to a position vector P, and combining word vectors W of the position to obtain an input vector input= [ W, P ] of a final Bi-LSTM coding layer.

5. The method for entity-relationship joint extraction according to claim 1, wherein: the conditional random field CRF decoding layer uses the data sequence labeling problem as a probability distribution problem, and simulates the probability distribution by using P (y|x), wherein x represents an observation sequence, y represents a tag sequence, and is used for labeling word boundaries in sentences, the two have the same chain structure and the same sequence length, and P (y|x) can be obtained by the following equation:

6. The method for entity-relationship joint extraction according to claim 1, wherein: from the perspective of the calculation region, global attention is adopted; from the point of view of the information used, the overall framework of the joint extraction method adopts self-attention; from the view of the structural hierarchy, a multi-layer Attention model is adopted; wherein the multi-layer Attention model is calculated by the following formula:

MultiHead(Q,K,V)＝Concat(head ₁ ,...head _h )W ⁰ (5)

multihead (Q, K, V) represents the result of a multi-head splice, Q, K, V are short for query, key and queue, respectively, query represents a query, key represents a key, queue represents a value, head _i Representing the results of the computation of the different Attention sublayers, each head maintains an independent matrix of query weightsKey weight matrix->Sum weight matrix->Thereby generating different query matrices, key matrices and value matrices, each head being calculated by +.>Obtaining a result; l (L) _x Representing the sequence length of each sub-layer, a _i Is Value of _i Corresponding weight coefficient, sim _i Based on Query and a Key _i To calculate the similarity of the two, j is the sequence index of each sub-layer, sim _j Is the scaled dot product attention of the sequence index for each sub-layer.