CN110781683B - Entity relation joint extraction method - Google Patents

Entity relation joint extraction method Download PDF

Info

Publication number
CN110781683B
CN110781683B CN201911063750.2A CN201911063750A CN110781683B CN 110781683 B CN110781683 B CN 110781683B CN 201911063750 A CN201911063750 A CN 201911063750A CN 110781683 B CN110781683 B CN 110781683B
Authority
CN
China
Prior art keywords
entity
layer
relation
relationship
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911063750.2A
Other languages
Chinese (zh)
Other versions
CN110781683A (en
Inventor
冯钧
杭婷婷
李晓东
陆佳民
严乐
朱跃龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN201911063750.2A priority Critical patent/CN110781683B/en
Publication of CN110781683A publication Critical patent/CN110781683A/en
Application granted granted Critical
Publication of CN110781683B publication Critical patent/CN110781683B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0463Neocognitrons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a method for jointly extracting entity relations based on multi-label labeling and a compound attention mechanism, which comprises the following steps: collecting corpus data for research, removing sentences with relation labels of None, and performing multi-label labeling on the rest sentences to form a training set; inputting the sentences marked by the multiple labels into a joint extraction model, and identifying entities and relations among the entities contained in the sentences through the joint extraction model to construct triples; and correcting the extracted triples by using the relation alignment model to adapt to multi-label labeling of the (head entity E1 and tail entity E2) entity pairs. The invention has the following effects: the method can effectively improve the accuracy of triplet extraction, and is an effective tool for extracting information aiming at unstructured data.

Description

Entity relation joint extraction method
Technical Field
The invention relates to the technical field of information extraction and natural language processing, in particular to a method for jointly extracting entity relations.
Background
With the rapid development of internet technology, the amount of data that people need to process increases rapidly, and how to extract entity and relationship information between entities from texts in these open fields rapidly and efficiently becomes an important problem that needs to be solved urgently. The entity relation extraction is a core task for extracting information aiming at unstructured data, and mainly aims to detect entities from texts and identify semantic relations among entity pairs at the same time, so that the method is widely applied to the aspects of knowledge graph construction, information retrieval, dialogue generation, question-answering systems and the like. At present, two frameworks of a pipeline method and a joint learning method are generally adopted for entity relation extraction. Traditional extraction tasks typically employ a "pipeline" approach to extract entities first and then identify relationships between the entities. The "pipelined" approach is very convenient to handle, but ignores the correlation of the two subtasks, creating a superposition of errors. Different from the 'pipeline' method, the joint extraction model can extract the relation between the entities from the text while extracting the entities, so as to avoid error accumulation caused by the pipeline method. However, the existing joint learning method still has the problems that the overlapping relation cannot be identified, the more abundant context information in the sentences cannot be learned, the extraction result is not corrected, and the like, so that the triad extraction accuracy is low. The main challenges at present are how to improve the accuracy of triplet extraction, and the solution of these problems depends largely on three aspects of quality of data annotation, performance of the model itself and correction of extraction results.
In the aspect of data annotation, the annotation granularity is gradually thinned, from an early ' IO ' annotation system, a ' BIO ' annotation system and a recently proposed ' BIOES ' annotation system, (wherein ' B ' indicates that the word is located at the initial position of an entity, ' I ' indicates that the word is located at the middle position of the entity, ' E ' indicates that the word is located at the tail end position of the entity, S ' indicates that the word is an entity, and ' O ' indicates that the word is a non-entity); the labeling systems described above contain entity information and relationship information between entities. Based on these labeling schemes, a joint extraction model is then used to implement the joint extraction task. However, most of the existing labeling methods are based on single-label labeling, and have some defects in the recognition of overlapping relations, and neglect the problem that one word has multiple labels and one word can appear in multiple triples.
Based on the data annotation, a joint extraction model needs to be overlapped to complete the joint extraction task. The joint extraction model existing at present is mostly feature-based, relies heavily on complex features, and makes it difficult to utilize global features. In order to automatically learn global features, an end-to-end model based on an encoding-decoding (Encoder-Decoder) framework is commonly adopted in the industry at present, and better experimental results are obtained on joint extraction tasks. However, this architecture has a problem in that the encoder uses a fixed window context vector for the internal representation, and does not obtain more rich context information, and thus has poor performance over long input or output sequences.
In terms of correction of the extraction result, if only the label of the overlapping relation is considered on the training set, recognition of the overlapping relation is not considered on the extraction result, and the extraction accuracy of the triples is reduced. For example, when data is marked, multiple relations exist between entity pairs, and if a result of only one single relation between entity pairs is predicted by the extraction model, prediction loss of the entity pairs in multiple relation classification can be brought.
In order to solve the above problems, a new extraction model needs to be provided to extract the entity and the relationship between the entities, which is a necessary measure in the information extraction field.
Disclosure of Invention
Aiming at the problems that the overlapping relation cannot be identified, the more abundant context information in sentences cannot be learned, the extraction result is not corrected and the like in the conventional joint learning method, the invention aims to provide the entity relation joint extraction method based on multi-label labeling and a compound attention mechanism, which can realize direct modeling of triples, avoid error accumulation caused by respectively extracting the entity and the relation between the entities and is an effective tool for information extraction and natural language processing.
In order to achieve the above object, the present invention is realized by the following technical scheme:
a method for entity relation joint extraction comprises the following steps:
performing multi-label labeling on corpus data to be processed;
inputting the sentences marked by the multiple labels into a joint extraction model, and identifying entities and relations among the entities contained in the sentences through the joint extraction model to construct triples;
and correcting the extracted triples by using the relation alignment model.
The entity relation joint extraction method includes: and collecting corpus data for research, removing sentences with the relation labels of None, and performing multi-label labeling on the rest sentences to form a training set.
The above-mentioned entity relation joint extraction method, wherein the corpus data includes complete information of sentences, sentence numbers, article numbers, mentioned relation in sentences, and entity information mentioned in sentences; the multi-label labeling refers to labeling each word in a sentence with one or more labels by using a three-segment labeling method.
The three-section labeling method is a prior art method and comprises the following steps: the characteristics of each tag are divided into three parts of relation type, belonged entity and position information:
relationship type: identifying a relationship type of the entity pair with a preset relationship tag, for example, "/peer/person/place_level" represents a residence relationship, and when there is a (person) entity pair and the relationship between the entity pairs is a residence relationship, the relationship between the entity pairs may be represented by "/peer/person/place_level";
the entity: with E1 representing the belonging to entity 1, E2 representing the belonging to entity 2, (in the present invention, E1 represents the head entity, E2 represents the tail entity);
position information: the position of each word in an entity is identified using "BMLO," where "B" indicates that the word is located at the beginning of the entity, "M" indicates that the word is located at the middle of the entity, "L" indicates that the word is located at the end of the entity, and "O" indicates that it is not an entity.
The entity relation joint extraction method comprises an embedding layer for mapping words in a high-dimensional discrete space to vectors in a low-dimensional continuous space, a Bi-directional long-short-term memory network Bi-LSTM coding layer for capturing semantic information of each word, a conditional random field CRF decoding layer for labeling a linear data sequence and a composite attention layer for comprehensively considering three characteristics of a calculation region, used information and structural hierarchy.
The method for constructing the triples specifically comprises the following steps: the joint extraction model performs word vector and position vector splicing on the labeled corpus data of the multiple labels at the embedding layer, and then inputs the spliced vectors to the neural network coding layer; the combined extraction model jointly learns the rich context information of the input token through the Bi-LSTM of the neural network coding layer and the composite attention mechanism of the neural network coding layer; the joint extraction model predicts the label of each word through CRF decoding of the decoding layer, and selects head entities and tail entities with relation through a composite attention mechanism of the decoding layer; finally, the triples are output through the output layer.
The above rich words are terms in the technical field of the present invention, and more feature information, such as word features, syntax features and semantic features, can be known.
The specific method for splicing the word vector and the position vector is as follows: numbering each position of the language data, enabling each number to correspond to a position vector P, and combining word vectors W of the position to obtain an input vector input= [ W, P ] of a final Bi-LSTM coding layer.
At the embedding layer, the input word is represented in two aspects, namely a word vector W, a word position vector P. The word vector W is obtained by using word2vec, and the word2vec model is used for converting words in natural language into dense vectors which can be understood by a computer. The position vector P of words determines the position of each word, or the distance between different words in the sequence, representing a local or even global structure. The invention realizes position coding by using sine and cosine functions, and not only can acquire absolute position information of words, but also can acquire relative position information. The absolute position information is achieved by the following formula:
PE(pos,2i)=sin(pos/10000 2i/d model ) (1)
PE(pos,2i+1)=cos(pos/10000 2i/d model ) (2)
the above formula maps the value of position code pos to a d model Position vector of dimension. The even positions of this vector are sine coded, the values being PE (pos, 2 i); the odd positions of this vector are cosine coded and the value is PE (pos, 2i+1).
Through the formula, each position of the input sequence can be numbered, then each number corresponds to one vector P, and the word vector W of the position is combined to obtain the input vector input= [ W, P ] of the final coding layer. By repeating the above operation for each word in the input sequence, a dimension matrix (input_length) of a specific input sequence is returned (emb_dim) and input to the Bi-LSTM layer. Where input_length is the length of the input sequence and emm_dim represents the embedded dimension.
In the sequence labeling problem, the Bi-LSTM encoding layer can effectively capture semantic information of each word, and comprises a forward LSTM layer, a backward LSTM layer and a connection layer. For each word W t The forward LSTM layer will take into account the slave word W 1 To W t Encoding W with context information of (a) t Labeled asAnd so on, the backward LSTM layer will be based on the slave W n To W t Encoding W with context information of (a) t Marked->Finally, by means of the connection->And->To represent W t Is expressed as->Wherein the forward LSTM layer structure is as shown in FIG. 4, which can be described as +.>The backward LSTM layer can be described as +.>Wherein x is t Input information indicating a certain moment, +.>Indicating the state of the forward cell and,indicating the backward cell state.
Further, the invention adopts a linear conditional random field CRF decoding layer which is suitable for labeling linear data sequences, the conditional random field CRF decoding layer takes the data sequence labeling problem as a probability distribution problem, P (y|x) is used for simulating probability distribution, wherein x represents an observation sequence, y represents a tag sequence and is used for labeling word boundaries in sentences, the two have the same chain structure and the same sequence length, and P (y|x) can be obtained by the following equation:
wherein Z (X) is a normalization factor, f k As a feature function, representing the features of the output node of the observed sequence at the i and i-1 positions; g k Features representing input-output nodes at i-position; lambda (lambda) k And alpha k The weight of the feature function is represented, k represents the feature, and i represents the position.
Further, the composite attention layer of the coding layer used in the invention comprehensively considers three parts of characteristics of the calculation area, the used information and the structural hierarchy.
From the point of view of the calculation region, the coding layer adopts Global Attention (Global Attention), and the decoding layer uses Local Attention (Local Attention). The two types of Attention differ: global Attention is used for solving weight probability for all keys, and each key has a corresponding weight, so that the Global computing mode is adopted. This way, the rationality is compared and the contents of all keys are referred to for weighting. However, this global calculation method may cause a relatively large calculation amount; local Attention is calculated for a window area, which is first located somewhere, and a window area can be obtained with this point as the center, and the Attention is calculated in this small area.
From the point of view of the information used, the whole framework of the joint extraction method adopts Self-Attention (Self-Attention), and the Self-Attention only considers text Self-information and does not consider additional information except the original text. The key feature of Self-attribute is that query=key=value, and only the input sequence is related. Each word in the sentence can be subjected to Attention calculation with other words, which is equivalent to searching for a link inside the sequence. Where query represents a query, key represents a key, and vaule represents a value.
The structure layer adopts a multi-layer Attention model; whether to divide the hierarchy may be decided according to the overall framework.
The multi-layer Attention model is calculated by the following formula:
MultiHead(Q,K,V)=Concat(head 1 ,...head h )W 0 (5)
multi-head (Q, K, V) represents the result of a Multi-head splice, Q, K, V being the short for query, key and valid, respectively, head i Representing the results of the computation of the different Attention sublayers, each head maintains an independent matrix of query weightsKey weight matrix->Sum weight matrix->Thereby generating different query matrices, key matrices and value matrices, each head being calculated by +.>Obtaining a result; l (L) x Representing the sequence length of each sub-layer, a i Is Value of i Corresponding weight coefficient, sim i Based on Query and a Key i To calculate the similarity of the two, j is the sequence index of each sub-layer, sim j Is the scaled dot product attention of the sequence index for each sub-layer.
Is a value of dimension K, sim of the coding layer and decoding layer of the present invention i The calculation needs to be distinguished, and all keys of the coding layer participate in the calculation.
The similarity calculation formula of the coding layer is as follows:
and the decoding layer only participates in calculation of keys with consistent relation types through the multi-label labeling result. The similarity calculation formula of the decoding layer is as follows:
f (Tag) is a switching function used to distinguish between tags associated with and not associated with the target triplet. Tag is a relationship Tag for a target triplet. tag (i) is used to obtain a relationship tag for entity labeling at location i.
The entity relation joint extraction method utilizes the relation alignment model based on translation to form a relation set by similar relation, and corrects the extracted triples by utilizing the relation set.
The above-mentioned similarity relationship refers to two different relationship words having the same meaning as the meaning, such as words of two different relationships of good friends and friends, but the meaning is the same.
The specific method for forming the relation set by the entity relation joint extraction method comprises the following steps:
given a triplet (h, r 1 T), requiring that h+r be satisfied between the extracted entity and relationship 1 Translational constraint of approximately t, whose scoring function is expressed as
If another triplet (h, r 2 T) whose extracted entities and relationships satisfy h+r 2 Translational constraint of approximately t, whose scoring function is expressed as
From the two constraints above we can consider f (h, r 1 ,t)≈f(h,r 2 T), then r 1 ≈r 2 By analogy, the relationships satisfying the translation constraint form a relationship set r= { R 1 ,r 2 ,...r n };
Wherein h, r 1 、r 2 …r n The meaning of t is the head entity in the triplet, the relationship in the triplet and the tail entity in the triplet, respectively.
The specific method for correcting the extracted triples by using the relation set is as follows: if the relation classification prediction of the entity pair is one of the relation sets, the relation of the entity pair is represented by any one of the relation sets, so that the problem of the reduction of the triad extraction accuracy caused by the single relation classification prediction is avoided.
The beneficial effects of the invention are as follows:
(1) The method comprises the steps of decomposing a plurality of labels of a word into a plurality of independent data labeling problems by using the multi-label labeling to represent the plurality of labels of the word, so as to solve the problem of identifying the overlapping relationship;
(2) The problem of fixed window existing in the traditional encoding-decoding (Encoder-Decoder) architecture is solved by using an entity relation joint extraction method based on a composite attention mechanism, so that an Encoder learns more abundant context information; the decoder can learn the label information related to the current triples when decoding the triples, so that the influence of some invalid labels is eliminated, and the extraction efficiency of the effective triples is improved;
(3) In order to reduce the prediction loss of (head entity, tail entity) (E1, E2) entity pair in multi-relation classification, the invention also adds a relation alignment function in the prediction result of relation classification, which is used for correcting the triplet extraction result so as to adapt to multi-label labeling of (E1, E2) entity pair.
Drawings
The invention is described in detail below with reference to the attached drawing figures and the detailed description:
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a diagram of an exemplary multi-label according to the present invention;
FIG. 3 is a diagram of a joint extraction model of the present invention;
FIG. 4 is a forward LSTM graph of the present invention;
fig. 5 is a multi-headed attention diagram of the present invention.
Detailed Description
The invention is further described in connection with the following detailed description, in order to make the technical means, the creation characteristics, the achievement of the purpose and the effect of the invention easy to understand.
As shown in fig. 1, the entity relationship joint extraction method of the present invention includes the following steps: corpus data of research is collected, sentences with relation labels of "None" are removed, for example, triples such as { "E1": "Minnesota", "E2": "Tim Pawlenty", "label": "None" }, where Minnesota is a place name, tim Pawlenty is a person name, label (label) indicates that the relation between the two entities is "None", and None indicates that the two entities have no relation. Performing multi-label labeling on the rest sentences to form a training set; inputting the sentences marked by the multiple labels into a joint extraction model, and identifying entities contained in the sentences and the relation among the entities through the joint extraction model to construct triples; the extracted triples are corrected by using the relation alignment to adapt to multi-label labeling of the (head entity E1 and tail entity E2) entity pairs.
The method comprises the following specific implementation steps:
step one: in the corpus analysis process, we find that the relation label with some triples is "None", in order to eliminate the influence of the "None" label on the training process, we first exclude the triples with the relation label of "None", and make multi-label labeling on the rest sentences.
For the example of multi-label labeling, as shown in FIG. 2, we find that four triples are contained in the input sentence:
{Bobby Fischer,/people/person/nationality,Iceland},
{Bobby Fischer,/people/deceased_person/place_of_death,Reykjavik},
{Iceland,/location/country/capital,Reykjavik},
{Iceland,/location/location/contains,Reykjavik}。
wherein "/peer/person/location", "/peer/facejperson/place_of_desath", "/location/count/potential" and "/location/location/contacts" are predefined relationship types for the dataset. "Bobby Fischer", "Iceland" and "Reykjavik" are entities contained in triples. These entities have different semantics in different triples and therefore need to be labeled according to a multi-label labeling scheme. The "Bobby Fischer" has two different data labels, wherein the labels "/scope/person/location_e1b" and "/scope/facejperson/place_of_location_e1b" of "Bobby" correspond to "/scope/person/location_e1l" and "/scope/location_person/place_of_location_e1l", thereby realizing the function that one word has multiple labels and one sentence has multiple triples.
Step two: inputting sentences marked by multiple labels into a joint extraction model, wherein the joint extraction model comprises an embedded layer, a Bi-directional long-short-term memory network Bi-LSTM coding layer, a conditional random field CRF decoding layer and a composite attention layer, and the concrete steps are as follows:
as shown in FIG. 3, wherein P represents a position vector, W represents a word vector, bi-LSTM represents an encoding layer, and is collectively referred to as a two-way long-short-term memory network; h represents a hidden layer; CRF represents a decoding layer, collectively called conditional random field; t denotes a label, and a denotes an attention vector.
The joint extraction model performs word vector and position vector splicing on the corpus subjected to multi-label labeling at an embedding layer, and then inputs the spliced vectors to a neural network coding layer; the joint extraction model jointly learns the rich context information of the input token through a two-way long-short-term memory network (Bi-LSTM) of the coding layer and a composite attention mechanism of the coding layer; the joint extraction model is decoded by a Conditional Random Field (CRF) of a decoding layer, the label of each word is predicted, and a head entity and a tail entity with a relation are selected through a composite attention mechanism of the decoding layer; finally, the triples are output through the output layer.
(1) Embedding layer
For mapping discrete words in an instance into a continuous input embedding. At the embedding layer, the input word is represented in two aspects, namely a word vector W, a word position vector P. The word vector W is obtained by using word2vec, and the word2vec model is used for converting words in natural language into dense vectors which can be understood by a computer. The position vector P of words determines the position of each word, or the distance between different words in the sequence, representing a local or even global structure. The invention realizes position coding by using sine and cosine functions, and not only can acquire absolute position information of words, but also can acquire relative position information. The absolute position information is achieved by the following formula:
the above formula maps the value of position code pos to a d model Position vector of dimension. The even positions of this vector are sine coded, the values being PE (pos, 2 i); the odd positions of this vector are cosine coded and the value is PE (pos, 2i+1). Through the formula, each position of the input sequence can be numbered, then each number corresponds to a vector P, and the word vector W of the position is combined to obtain the input vector input= [ W, P ] of the final coding layer]. By repeating the above operation for each word in the input sequence, a dimension matrix (input_length) of a specific input sequence is returned (emb_dim) and input to the Bi-LSTM layer. Where input_length is the length of the input sequence and emm_dim represents the embedded dimension.
(2) Bi-LSTM coding layer
In the sequence labeling problem, the Bi-LSTM encoding layer can effectively capture semantic information of each word, and comprises a forward LSTM layer, a backward LSTM layer and a connection layer. For each word W t The forward LSTM layer will pass through the examinationConsider word W 1 To W t Encoding W with context information of (a) t Labeled asAnd so on, the backward LSTM layer will be based on the slave W n To W t Encoding W with context information of (a) t Marked->Finally, by means of the connection->And->To represent W t Is expressed as->Wherein the forward LSTM layer structure is as shown in FIG. 4, which can be described as +.>The backward LSTM layer can be described as +.>Wherein x is t Input information indicating a certain moment, +.>Indicating forward cell status,/->Indicating the backward cell state.
(3) CRF decoding layer
The invention adopts a linear Conditional Random Field (CRF) and is suitable for labeling linear data sequences. The conditional random field CRF decoding layer uses the data sequence labeling problem as a probability distribution problem, and simulates the probability distribution by P (y|x), where x represents the observation sequence and y represents the tag sequence, for labeling word boundaries in sentences, both having the same chain structure and the same sequence length, and P (y|x) can be obtained by the following equation:
wherein Z (X) is a normalization factor, f k As a feature function, representing the features of the output node of the observed sequence at the i and i-1 positions; g k Features representing input-output nodes at i-position; lambda (lambda) k And alpha k The weight of the feature function is represented, k represents the feature, and i represents the position.
(4) Composite attention layer
The composite attention layer used in the invention comprehensively considers three characteristics of a calculation area, used information and structural hierarchy:
from the calculation region, the Attention (Attention) of the coding layer uses Global Attention (Global Attention), and the decoding layer uses Local Attention (Local Attention). The two types of Attention differ: global Attention is used for solving weight probability for all keys, and each key has a corresponding weight, so that the Global computing mode is adopted. This way, the rationality is compared and the contents of all keys are referred to for weighting. However, this global calculation method may cause a relatively large calculation amount; local Attention is calculated for a window area, which is first located somewhere, and a window area can be obtained with this point as the center, and the Attention is calculated in this small area.
From the information point of view used, the overall framework of the present invention employs Self-Attention (Self-Attention). Self-Attention only considers text Self information, and does not consider additional information except original text. The key feature of Self-attribute is that query=key=value, and only the input sequence is related. Each word in the sentence can be subjected to Attention calculation with other words, which is equivalent to searching for a link inside the sequence. Where query represents a query, key represents a key, and vaule represents a value.
The invention adopts a multi-layer Attention model in the aspect of structural hierarchy, and can decide whether to divide the hierarchy according to the whole framework. The coding layer Attention focuses on the characteristics of words in the current sentence, and the words in the sentence can be marked with proper labels according to the matching degree of the current words and the labels. The decoding layer Attention focuses on words with the same relation labels, can be positioned to a head entity and a tail entity according to the relation labels, and then performs triplet extraction, and can ignore words irrelevant to the current triples, so that the extraction performance of the triples is improved. A multi-head self-Attention mechanism is used in the coding layer and the decoding layer, a plurality of queries are used for carrying out multi-time Attention on a section of original text, each query pays Attention to different parts of the original text, and finally the results are spliced. A schematic diagram of multi-head Attention is shown in FIG. 5, wherein Q, K, V are short for query, key and variance, linear represents Linear transformation, h represents multiple heads, scaled Dot-product Attention represents scaling Dot product Attention, and Concat represents stitching of scaling Dot products for h times.
The multi-head Attention is mainly calculated by the following formulas:
MultiHead(Q,K,V)=Concat(head 1 ,...head h )W 0 (5)
multi-head (Q, K, V) represents the result of a Multi-head splice. head part i Representing the results of the calculations of the different Attention sublayers. Each head remains independentIs a query weight matrix of (1)Key weight matrix->Sum weight matrix->Thereby producing different query matrices, key matrices, and value matrices. The calculation of each head can be made of +.>The result is obtained. L (L) x Representing the sequence length of each sub-layer. a, a i Is Value of i Corresponding weight coefficients. Sim (Sim) i Based on Query and a Key i To calculate the similarity of the two, j is the sequence index of each sub-layer, sim j Is the scaled dot product attention of the sequence index for each sub-layer. />Is a value of dimension K. Sim of the coding layer and decoding layer of the present invention i The calculation needs to be distinguished, and all keys of the coding layer participate in the calculation.
The similarity calculation formula of the coding layer is as follows:
and the decoding layer only participates in calculation of keys with consistent relation types through the multi-label labeling result. The similarity calculation formula of the decoding layer is as follows:
f (Tag) is a switching function used to distinguish between tags associated with and not associated with the target triplet. Tag is a relationship Tag for a target triplet. tag (i) is used to obtain a relationship tag for entity labeling at location i.
Step three: and correcting the extracted triples by using relation alignment, and designing a translation-based relation alignment model inspired by a translation-based knowledge graph embedding model TransE model. Given a triplet (h, r 1 T), requiring that h+r be satisfied between the extracted entity and relationship 1 The translation constraint of t, whose scoring function can be expressed as
If another triplet (h, r 2 T) whose extracted entities and relationships satisfy h+r 2 Translational constraint of approximately t, whose scoring function is expressed as
From the two constraints above we can consider f (h, r 1 ,t)≈f(h,r 2 T), then r 1 ≈r 2 Similarly, the constrained relationships form a set of relationships r= { R 1 ,r 2 ,...r n (where h is the head entity in the triplet, r) 1 ,r 2 ,...r n R is the tail entity in the triplet, which is the relationship in the triplet. And correcting the extraction result by using the relation set, and if the relation classification of the entity pair predicts one of the relation sets, the relation of the entity pair can be expressed as any one of the relation sets, so that the problem of the reduction of the extraction accuracy of the triples caused by single predicted relation is avoided.
The entity relationship joint extraction method of the invention uses multi-label labeling to represent a plurality of labels of a word, and decomposes the multi-label into a plurality of independent data labeling problems, so as to solve the recognition problem of overlapping relationship; the problem of fixed window existing in the traditional encoding-decoding (Encoder-Decoder) architecture is solved by using an entity relation joint extraction method based on a composite attention mechanism, so that an Encoder learns more abundant context information; the decoder can learn the label information related to the current triples when decoding the triples, so that the influence of some invalid labels is eliminated, and the extraction efficiency of the effective triples is improved; in order to reduce the prediction loss of (head entity, tail entity) (E1, E2) entity pair in multi-relation classification, a relation alignment function is added in the prediction result of relation classification, so as to correct the triplet extraction result to adapt to multi-label labeling of (E1, E2) entity pair. The invention has the following effects: the method can effectively improve the accuracy of triplet extraction, and is an effective tool for extracting information aiming at unstructured data.
The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. The present invention has been described in the art by way of illustration of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (6)

1. The entity relationship joint extraction method is characterized by comprising the following steps of:
performing multi-label labeling on corpus data to be processed;
inputting the sentences marked by the multiple labels into a joint extraction model, and identifying entities and relations among the entities contained in the sentences through the joint extraction model to construct triples;
the joint extraction model comprises an embedding layer for mapping words in a high-dimensional discrete space to vectors in a low-dimensional continuous space, a Bi-directional long-short-term memory network Bi-LSTM coding layer for capturing semantic information of each word, a conditional random field CRF decoding layer for labeling a linear data sequence and a composite attention layer for comprehensively considering three characteristics of a calculation region, used information and a structural layer;
the joint extraction model performs word vector and position vector splicing on the labeled corpus data of the multiple labels at the embedding layer, and then inputs the spliced vectors to the neural network coding layer; the joint extraction model jointly learns the rich context information of the input token through Bi-LSTM of the neural network coding layer and a composite attention mechanism of the neural network coding layer; the joint extraction model predicts the label of each word through CRF decoding of the decoding layer, and selects head entities and tail entities with relation through a composite attention mechanism of the decoding layer; finally, outputting the triples through an output layer;
correcting the extracted triples by using a relation alignment model;
forming a relation set by using the relation alignment model based on translation and similar relation, and correcting the extracted triples by using the relation set; the specific method for correcting the extracted triples by using the relation set is as follows: if the relationship classification of the entity pair predicts as one of the relationship sets, the relationship of the entity pair is represented by any one of the relationship sets;
the specific method for forming the relation set is as follows:
given a triplet (h, r 1 T), requiring that h+r be satisfied between the extracted entity and relationship 1 Translational constraint of approximately t, whose scoring function is expressed as
If another triplet (h, r 2 T) whose extracted entities and relationships satisfy h+r 2 Translational constraint of approximately t, whose scoring function is expressed as
By the two constraints mentioned above, it is considered that f (h, r 1 ,t)≈f(h,r 2 T), then r 1 ≈r 2 By analogy, the relationships satisfying the translation constraint form a relationship set r= { R 1 ,r 2 ,...r n };
Wherein h, r 1 、r 2 …r n The meaning of t is the head entity in the triplet, the relationship in the triplet and the tail entity in the triplet, respectively.
2. The method for entity-relationship joint extraction according to claim 1, wherein: the multi-label labeling of the corpus data to be processed comprises the following steps: and collecting corpus data for research, removing sentences with the relation labels of None, and performing multi-label labeling on the rest sentences to form a training set.
3. The method for entity-relationship joint extraction according to claim 1, wherein: the corpus data comprises complete information of sentences, sentence numbers, article numbers, mention relations in the sentences and entity information mentioned in the sentences; the multi-label labeling refers to labeling each word in a sentence with one or more labels by using a three-segment labeling method.
4. The method for entity-relationship joint extraction according to claim 1, wherein: the specific method for splicing the word vector and the position vector comprises the following steps: numbering each position of the language data, enabling each number to correspond to a position vector P, and combining word vectors W of the position to obtain an input vector input= [ W, P ] of a final Bi-LSTM coding layer.
5. The method for entity-relationship joint extraction according to claim 1, wherein: the conditional random field CRF decoding layer uses the data sequence labeling problem as a probability distribution problem, and simulates the probability distribution by using P (y|x), wherein x represents an observation sequence, y represents a tag sequence, and is used for labeling word boundaries in sentences, the two have the same chain structure and the same sequence length, and P (y|x) can be obtained by the following equation:
wherein Z (x) is a normalization factor, f k As a feature function, representing the features of the output node of the observed sequence at the i and i-1 positions; g k Features representing input-output nodes at i-position; lambda (lambda) k And alpha k The weight of the feature function is represented, k represents the feature, and i represents the position.
6. The method for entity-relationship joint extraction according to claim 1, wherein: from the perspective of the calculation region, global attention is adopted; from the point of view of the information used, the overall framework of the joint extraction method adopts self-attention; from the view of the structural hierarchy, a multi-layer Attention model is adopted; wherein the multi-layer Attention model is calculated by the following formula:
MultiHead(Q,K,V)=Concat(head 1 ,...head h )W 0 (5)
multihead (Q, K, V) represents the result of a multi-head splice, Q, K, V are short for query, key and queue, respectively, query represents a query, key represents a key, queue represents a value, head i Representing the results of the computation of the different Attention sublayers, each head maintains an independent matrix of query weightsKey weight matrix->Sum weight matrix->Thereby generating different query matrices, key matrices and value matrices, each head being calculated by +.>Obtaining a result; l (L) x Representing the sequence length of each sub-layer, a i Is Value of i Corresponding weight coefficient, sim i Based on Query and a Key i To calculate the similarity of the two, j is the sequence index of each sub-layer, sim j Is the scaled dot product attention of the sequence index for each sub-layer.
CN201911063750.2A 2019-11-04 2019-11-04 Entity relation joint extraction method Active CN110781683B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911063750.2A CN110781683B (en) 2019-11-04 2019-11-04 Entity relation joint extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911063750.2A CN110781683B (en) 2019-11-04 2019-11-04 Entity relation joint extraction method

Publications (2)

Publication Number Publication Date
CN110781683A CN110781683A (en) 2020-02-11
CN110781683B true CN110781683B (en) 2024-04-05

Family

ID=69388700

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911063750.2A Active CN110781683B (en) 2019-11-04 2019-11-04 Entity relation joint extraction method

Country Status (1)

Country Link
CN (1) CN110781683B (en)

Families Citing this family (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368528B (en) * 2020-03-09 2022-07-08 西南交通大学 Entity relation joint extraction method for medical texts
CN111460807B (en) * 2020-03-13 2024-03-12 平安科技(深圳)有限公司 Sequence labeling method, device, computer equipment and storage medium
CN111444715B (en) * 2020-03-24 2022-12-02 腾讯科技(深圳)有限公司 Entity relationship identification method and device, computer equipment and storage medium
CN111539211A (en) * 2020-04-17 2020-08-14 中移(杭州)信息技术有限公司 Entity and semantic relation recognition method and device, electronic equipment and storage medium
CN111737383B (en) * 2020-05-21 2021-11-23 百度在线网络技术(北京)有限公司 Method for extracting spatial relation of geographic position points and method and device for training extraction model
CN111476023B (en) * 2020-05-22 2023-09-01 北京明朝万达科技股份有限公司 Method and device for identifying entity relationship
CN113807079B (en) * 2020-06-11 2023-06-23 四川大学 Sequence-to-sequence-based end-to-end entity and relationship joint extraction method
CN111767409B (en) * 2020-06-14 2022-08-30 南开大学 Entity relationship extraction method based on multi-head self-attention mechanism
CN111832293B (en) * 2020-06-24 2023-05-26 四川大学 Entity and relation joint extraction method based on head entity prediction
CN111950297A (en) * 2020-08-26 2020-11-17 桂林电子科技大学 Abnormal event oriented relation extraction method
CN112214966A (en) * 2020-09-04 2021-01-12 拓尔思信息技术股份有限公司 Entity and relation combined extraction method based on deep neural network
CN112069823B (en) * 2020-09-17 2021-07-09 华院计算技术(上海)股份有限公司 Information processing method and device
CN112101009B (en) * 2020-09-23 2024-03-26 中国农业大学 Method for judging similarity of red-building dream character relationship frames based on knowledge graph
CN112148891A (en) * 2020-09-25 2020-12-29 天津大学 Knowledge graph completion method based on graph perception tensor decomposition
CN112149423B (en) * 2020-10-16 2024-01-26 中国农业科学院农业信息研究所 Corpus labeling method and system for domain entity relation joint extraction
CN112613306A (en) * 2020-12-31 2021-04-06 恒安嘉新(北京)科技股份公司 Method, device, electronic equipment and storage medium for extracting entity relationship
CN112749283A (en) * 2020-12-31 2021-05-04 江苏网进科技股份有限公司 Entity relationship joint extraction method for legal field
CN112685513A (en) * 2021-01-07 2021-04-20 昆明理工大学 Al-Si alloy material entity relation extraction method based on text mining
CN112487820B (en) * 2021-02-05 2021-05-25 南京邮电大学 Chinese medical named entity recognition method
CN112990985B (en) * 2021-04-26 2023-08-22 北京楚梵基业科技有限公司 Label joint probability analysis method and system
WO2022227196A1 (en) * 2021-04-27 2022-11-03 平安科技(深圳)有限公司 Data analysis method and apparatus, computer device, and storage medium
CN113158676A (en) * 2021-05-12 2021-07-23 清华大学 Professional entity and relationship combined extraction method and system and electronic equipment
CN113220844B (en) * 2021-05-25 2023-01-24 广东省环境权益交易所有限公司 Remote supervision relation extraction method based on entity characteristics
CN113486161A (en) * 2021-05-27 2021-10-08 中国电子科技集团公司电子科学研究院 Intelligent semantic retrieval system based on knowledge graph in special field
CN113221571B (en) * 2021-05-31 2022-07-01 重庆交通大学 Entity relation joint extraction method based on entity correlation attention mechanism
CN113590779B (en) * 2021-06-30 2023-04-25 四川大学 Construction method of intelligent question-answering system of knowledge graph in air traffic control field
CN113553385B (en) * 2021-07-08 2023-08-25 北京计算机技术及应用研究所 Relation extraction method for legal elements in judicial document
CN114004230B (en) * 2021-09-23 2022-07-05 杭萧钢构股份有限公司 Industrial control scheduling method and system for producing steel structure
CN114118056A (en) * 2021-10-13 2022-03-01 中国人民解放军军事科学院国防工程研究院工程防护研究所 Information extraction method for war research report
CN113901825B (en) * 2021-11-22 2024-05-03 东北大学 Entity relationship joint extraction method and system based on active deep learning
CN113947087B (en) * 2021-12-20 2022-04-15 太极计算机股份有限公司 Label-based relation construction method and device, electronic equipment and storage medium
CN114528418B (en) * 2022-04-24 2022-10-14 杭州同花顺数据开发有限公司 Text processing method, system and storage medium
CN115168599B (en) * 2022-06-20 2023-06-20 北京百度网讯科技有限公司 Multi-triplet extraction method, device, equipment, medium and product
CN115861715B (en) * 2023-02-15 2023-05-09 创意信息技术股份有限公司 Knowledge representation enhancement-based image target relationship recognition algorithm

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165385A (en) * 2018-08-29 2019-01-08 中国人民解放军国防科技大学 Multi-triple extraction method based on entity relationship joint extraction model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090019032A1 (en) * 2007-07-13 2009-01-15 Siemens Aktiengesellschaft Method and a system for semantic relation extraction

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165385A (en) * 2018-08-29 2019-01-08 中国人民解放军国防科技大学 Multi-triple extraction method based on entity relationship joint extraction model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
丁琛.基于神经网络的实体识别和关系抽取的联合模型研究.《中国优秀硕士学位论文全文数据库 信息科技辑》.2019,第4章. *
陈佳沣等.基于强化学习的实体关系联合抽取模型.计算机应用Journal of Computer Applications.2019,全文. *

Also Published As

Publication number Publication date
CN110781683A (en) 2020-02-11

Similar Documents

Publication Publication Date Title
CN110781683B (en) Entity relation joint extraction method
CN112214995B (en) Hierarchical multitasking term embedded learning for synonym prediction
CN111581961B (en) Automatic description method for image content constructed by Chinese visual vocabulary
CN113128229B (en) Chinese entity relation joint extraction method
CN114169330B (en) Chinese named entity recognition method integrating time sequence convolution and transform encoder
CN112818676B (en) Medical entity relationship joint extraction method
CN113221571B (en) Entity relation joint extraction method based on entity correlation attention mechanism
Zhang et al. Aspect-based sentiment analysis for user reviews
CN113743119B (en) Chinese named entity recognition module, method and device and electronic equipment
CN114417839A (en) Entity relation joint extraction method based on global pointer network
CN113704437A (en) Knowledge base question-answering method integrating multi-head attention mechanism and relative position coding
CN115688752A (en) Knowledge extraction method based on multi-semantic features
CN116932722A (en) Cross-modal data fusion-based medical visual question-answering method and system
CN116992042A (en) Construction method of scientific and technological innovation service knowledge graph system based on novel research and development institutions
Qin et al. A survey on text-to-sql parsing: Concepts, methods, and future directions
CN114020900B (en) Chart English abstract generating method based on fusion space position attention mechanism
CN114036934A (en) Chinese medical entity relation joint extraction method and system
CN114048314A (en) Natural language steganalysis method
CN116680377B (en) Chinese medical term self-adaptive alignment method based on log feedback
CN114298052B (en) Entity joint annotation relation extraction method and system based on probability graph
CN116151260A (en) Diabetes named entity recognition model construction method based on semi-supervised learning
CN116049422A (en) Echinococcosis knowledge graph construction method based on combined extraction model and application thereof
CN115545038A (en) Aspect emotion analysis method for optimizing grid label
CN115169285A (en) Event extraction method and system based on graph analysis
CN113869059A (en) Natural language text triple extraction method and system based on attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant