CN113032571A

CN113032571A - Entity and relationship extraction method

Info

Publication number: CN113032571A
Application number: CN202110420639.5A
Authority: CN
Inventors: 程良伦; 牛伟才; 张伟文
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2021-04-19
Filing date: 2021-04-19
Publication date: 2021-06-25

Abstract

The invention discloses an entity and relation extraction method, which is used for solving the technical problem that the entity recognition and relation extraction effects of words in the prior art are poor. The method comprises the following steps: extracting multi-granularity characteristic representation information of each word in a preset text; extracting first node feature representation information of the word based on the multi-granularity feature representation information; constructing adaptive adjacency matrixes of various preset relation types; extracting second node characteristic representation information of the word according to the self-adaptive adjacency matrix and the first node characteristic representation information; determining an entity type of each word based on the second node feature representation information of each word; a relation category between any two words is calculated based on the second node characteristic representation information of each word.

Description

Entity and relationship extraction method

Technical Field

The invention relates to the technical field of text processing, in particular to an entity and relationship extraction method.

Background

The large-scale knowledge graph is constructed, so that the fields of text generation, question answering systems, recommendation systems and the like can be better served. However, the triple information required by the knowledge graph is often hidden in massive unstructured internet texts, and the labeling only by manpower wastes a great deal of money and manpower resources. Therefore, it is important to extract the correct entity and relationship triples from the large amount of unstructured text.

Entity recognition and relationship extraction have been receiving wide attention from researchers as the lowest level of tasks in natural language processing. The earliest work looked the extraction of relational triples as two pipeline subtasks, i.e. first all entities in a sentence were identified; and then, carrying out relation classification on the extracted entity pairs according to the semantic information of the sentences.

However, the above method often causes a problem of error accumulation propagation of information, because if an entity is not correctly identified, the relationship classification is necessarily guided by the error information. Moreover, the method ignores the interaction between the two subtasks, and therefore loses much important information. In order to solve the problem, the prior art proposes an entity-relationship joint extraction method based on feature engineering, and these models aim to establish interaction between entities and relationships, and although information of the entities and the relationships is utilized at the same time, the effect of the models is too much more dependent on the set feature engineering. With the development of neural networks, neural network construction models are increasingly used in practical applications to automatically learn the feature representation of sentences, so as to extract the relation triples implicit in texts. Despite some advances made in these models, it is still not possible to cope with the relational structure in a complex context. For example, two triples share the same entity, the model is required to be able to accurately identify the triples related to each other according to the semantic information of the context, and there is a certain relationship in the triples sharing the same entity, for example, there is a strong semantic relationship between the triples (Michael Jacks, born, America) and the triples (State of Indiana, LocateIn, America), however, the existing joint extraction model ignores the semantic interaction between words under different relationships, which results in loss of a lot of useful information, and results in poor effect of entity identification and relationship extraction of the words.

Disclosure of Invention

The invention provides an entity and relationship extraction method, which is used for solving the technical problem that the entity recognition and relationship extraction effects of words in the prior art are poor.

The invention provides an entity and relationship extraction method, which comprises the following steps:

extracting multi-granularity characteristic representation information of each word in a preset text;

extracting first node feature representation information of the word based on the multi-granularity feature representation information;

constructing adaptive adjacency matrixes of various preset relation types;

extracting second node characteristic representation information of the word according to the self-adaptive adjacency matrix and the first node characteristic representation information;

determining an entity type of each of the words based on the second node feature representation information of each of the words;

calculating a relationship category between any two words based on the second node characteristic representation information of each of the words.

Optionally, the step of extracting multi-granularity feature representation information of each word in the preset text includes:

calculating hidden state representation information of each word in a preset text;

extracting character-level word features and word-level part-of-speech features of each word;

and generating multi-granularity characteristic representation information of each word by adopting the hidden state representation information, the character-level word characteristics and the word-level part-of-speech characteristics.

Optionally, the step of extracting the first node feature representation information of the word based on the multi-granularity feature representation information includes:

creating an adjacency matrix for the word;

extracting incoming node representation information and outgoing node representation information of the word by adopting the adjacency matrix and the multi-granularity feature representation information;

generating first node characteristic representation information of the word using the incoming node representation information and the outgoing node representation information.

Optionally, the step of constructing an adaptive adjacency matrix of multiple preset relationship types includes:

obtaining sentence characteristics of sentences in which each word is located;

acquiring a hidden state dimension and a preset input dimension of the sentence characteristics;

calculating dependency weight initial hidden representation information of the words by adopting the sentence characteristics, the hidden state dimension and the input dimension;

adopting the initial hidden representation information to respectively calculate the respective corresponding dependency information of the words under various preset relationship types;

and constructing self-adaptive adjacency matrixes respectively corresponding to a plurality of preset relation types based on the dependency information.

Optionally, the step of extracting second node feature representation information of the word according to the adaptive adjacency matrix and the first node feature representation information includes:

calculating forward characteristic representation information and backward characteristic representation information of the word by adopting the self-adaptive adjacency matrix and the first node characteristic representation information;

and generating second node feature representation information of the word by using the forward feature representation information and the backward feature representation information.

Optionally, the dependency information includes a query vector and a key-value vector.

Optionally, the step of extracting the character-level word features and the word-level part-of-speech features of each word includes:

and extracting character-level word characteristics and word-level part-of-speech characteristics of each word by adopting a preset bidirectional long-and-short time memory network.

According to the technical scheme, the invention has the following advantages: the invention provides a method for extracting entities and relations in a combined manner, and particularly discloses the following steps: extracting multi-granularity characteristic representation information of each word in a preset text; extracting first node feature representation information of the word based on the multi-granularity feature representation information; constructing adaptive adjacency matrixes of various preset relation types; extracting second node characteristic representation information of the word according to the self-adaptive adjacency matrix and the first node characteristic representation information; determining an entity type of each word based on the second node feature representation information of each word; a relation category between any two words is calculated based on the second node characteristic representation information of each word. According to the method, the self-adaptive adjacency matrixes of various preset relationship types are constructed, the second node characteristic representation information of the words is calculated based on the self-adaptive adjacency matrixes under different relationship types, so that semantic interaction between the words under different relationship types is captured, and the recognition effect of the entity types of the words and the relationship types among different words is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 is an example of different categories of sentences;

FIG. 2 is a flowchart illustrating steps of a method for extracting entities and relationships according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating steps of an entity and relationship extraction method according to another embodiment of the present invention.

Detailed Description

In practical applications, the early joint extraction model cannot extract overlapping triplets in plain text. As shown in fig. 1, sentences are classified into three types, Normal (Normal), single entity overlap (SingleEntityOverlap), and double entity overlap (entitypair overlap), according to the degree of relationship overlap. The first statement is of the Normal type, i.e., there is no intersection between an entity and a relationship. The second sentence is a single entity overlap, i.e., two triples share one entity. The third sentence is a two-entity recognition, i.e., two entities have multiple relationships. The problem of triplet overlap directly plagues the sequential label-based joint extraction scheme, which assumes only one label per token. To solve the problem of overlapping triplets. Researchers have proposed a replication mechanism to replicate entities from sentences repeatedly, but performance is poor because all entities cannot be replicated at once. To enhance the interaction between entities and relationships, researchers have proposed using graphical neural networks to model text as a relationship weighted graph, predicting both entities and relationships. But previous studies have ignored semantic interactions between words under different relationships, which can lose much useful information.

In view of this, embodiments of the present invention provide an entity and relationship extraction method, which is used to solve the technical problem that the entity recognition and relationship extraction effects of the word in the prior art are poor.

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 2, fig. 2 is a flowchart illustrating steps of an entity and relationship extraction method according to an embodiment of the present invention.

The entity and relationship extraction method provided by the invention specifically comprises the following steps:

step 201, extracting multi-granularity characteristic representation information of each word in a preset text;

in the embodiment of the invention, multi-granularity feature representation information of each word in a sentence can be extracted through a feature mixing layer, and the multi-granularity feature representation information comprises a global context feature encoder, a character feature encoder and a part-of-speech feature encoder.

Step 202, extracting first node feature representation information of a word based on multi-granularity feature representation information;

after obtaining the multi-granularity feature representation information of each word, a feature mixture layer and a BiGCN (BiGCN) layer of a first stage may be stacked to automatically obtain the first node feature representation information of each word.

Wherein, BiGCN: when calculating the characteristics of the current node, not only all the characteristics of the path pointing to the node, i.e. the pointing characteristics, but also the characteristics of the path pointed to by the node, i.e. the pointed characteristics, are calculated.

Step 203, constructing adaptive adjacency matrixes of various preset relationship types;

in order to enhance information interaction among all parts of the triples, the embodiment of the invention provides a node-aware attention mechanism to acquire hidden association information among words. Thus, a complete word association matrix can be established and is more suitable for real data distribution through supervised learning.

Specifically, the dependency weights for words in different relationship spaces are different. To more flexibly predict overlapping triples, the node-aware attention mechanism of embodiments of the present invention dynamically learns the strength of correlation between different words in each relationship space in an end-to-end manner. The original dependency tree is converted into a plurality of fully connected graphs. Each graph contains semantic information under different relation spaces, and a dependency relation adaptive adjacency matrix under each relation type is further constructed.

Step 204, extracting second node characteristic representation information of the word according to the self-adaptive adjacency matrix and the first node characteristic representation information;

after the adaptive adjacency matrix of each predefined relation is acquired, the first node feature representation information is used as initial sentence input information of the BiGCN feature extractor in the second stage. The feature information of the sentences is mapped into different relation spaces, and the dependency correlation strength between the nodes is dynamically learned by using the adaptive adjacency matrix. And then fusing the node dependency relationship information and the first node feature representation information extracted by the BiGCN under all relationship spaces to obtain second node feature representation information of each word.

Step 205, determining the entity type of each word based on the second node characteristic representation information of each word;

and step 206, calculating the relation category between any two words based on the second node characteristic representation information of each word.

In the embodiment of the invention, the entity types and the relation categories in the text can be extracted simultaneously by utilizing the feature mixed layer and the data extracted by the BiGCN in the two stages.

According to the method, the self-adaptive adjacency matrixes of various preset relationship types are constructed, the second node characteristic representation information of the words is calculated based on the self-adaptive adjacency matrixes under different relationship types, so that semantic interaction between the words under different relationship types is captured, and the recognition effect of the entity types of the words and the relationship types among different words is improved.

Referring to fig. 3, fig. 3 is a flowchart illustrating steps of an entity and relationship extraction method according to another embodiment of the present invention. The method specifically comprises the following steps:

step 301, calculating hidden state representation information of each word in a preset text;

in an embodiment of the present invention, context information may first be encoded using a pre-trained BERT model. BERT is a series combination of N identical transmamers modules, each of which can be represented using trans (x), where x represents the input vector of a sentence. The BERT model was used as follows:

where I is the word index in the vocabulary, W_sIs sub-word is an embedded matrix, W_pIs a position embedding matrix of the input sentence。

Is the hidden state representation information of the l-th layer.

Step 302, extracting character-level word characteristics and word-level part-of-speech characteristics of each word;

in addition to the global context representation of the sentence, embodiments of the present invention also introduce character embedding and part-of-speech embedding. Character-level word features and word-level part-of-speech features of each word are extracted by using a Long short-Term Memory network (LSTM) as a feature extractor for character-level and part-of-speech information. In an LSTM network, the output of the current time step is not only related to previous states, but also possibly future states. Firstly, randomly initializing an embedding matrix and a part-of-speech embedding matrix of characters, and then conveying the character embedding matrix and the part-of-speech embedding matrix to a bidirectional LSTM to extract word features at a character level and part-of-speech features at a word level. The specific mode is as follows:

wherein the content of the first and second substances,

d_p、d_cthe hidden layer state dimension representing BiLSTM then represents the character and part-of-speech characteristics of the input sentence as a sum.

Step 303, generating multi-granularity characteristic representation information of each word by adopting the hidden state representation information, the character-level word characteristics and the word-level part-of-speech characteristics;

then, the three kinds of characteristic information including the hidden state representation information of each word, the character level word characteristics and the word level part-of-speech characteristics are spliced to obtain the characteristic combination representation h of the sentence^s＝[h^w；h^c；h^p]And then, taking the combined feature representation as an input of the BilSTM to continuously extract implicit association information among the three features so as to obtain multi-granularity feature representation information of the words. The generated multi-granularity feature representation information can greatly alleviate the OOV (out-of-vocabulary) problem. And the characteristic potential association semantic information can be mined, and the specific mode is as follows:

wherein the content of the first and second substances,

for multi-granular characterization of information, d_w、d_c、d_pHidden layer dimensions of context embedding, character embedding and part of speech embedding are respectively.

Step 304, extracting first node feature representation information of the word based on the multi-granularity feature representation information;

step 304 may include the following sub-steps:

s41, creating an adjacency matrix of words;

s42, adopting the adjacency matrix and the multi-granularity characteristic representation information, and extracting the incoming node representation information and the outgoing node representation information of the words;

s43, using the incoming node representing information and the outgoing node representing information, generating first node characteristic representing information of the word.

In practical application, original data passes through an encoding layer and a feature mixing layer to obtain representation information of an input statement, but because a fixed graph structure is lacked, the invention uses a syntax dependency tree parser to create a dependency tree, finally uses the dependency tree as a adjacency matrix of input statement feature information, and uses GCN to extract regional dependency features. The original GCN uses an undirected graph to extract features of an input statement, and the undirected graph loses much dependency structure information because dependency direction information exists in a dependency tree. The invention considers the information transmitted by the word node and transmitted by the word node at the same time, namely bidirectional BiGCN is used.

The method comprises the following specific steps:

wherein the content of the first and second substances,

first node characterizing information representing the ith word, including outgoing node characterizing information propagated from the node to neighboring nodes

And incoming node representation information received from neighboring nodes

Are parameters that can be learned in the model and represent the weights of the outward output and inward input of the node words, respectively. The initial input of node u is the output of the feature mixture layer

I.e. multi-granular characterizing information, A_ijIs a contiguous matrix. Finally, the incoming node representation information and the outgoing node representation information are concatenated as first node characteristic representation information of the nodes.

Step 305, constructing adaptive adjacency matrixes of various preset relationship types;

in an embodiment of the present invention, step 305 may include the following sub-steps:

s51, obtaining sentence characteristics of the sentence in which each word is located;

s52, acquiring hidden state dimension and preset input dimension of sentence characteristics;

s53, calculating the dependency weight initial hidden representation information of the words by using sentence characteristics, hidden state dimensions and input dimensions;

s54, adopting the initial hidden representation information to respectively calculate the respective corresponding dependency information of the words under various preset relationship types; the dependency information comprises a query vector and a key value vector;

and S55, constructing adaptive adjacency matrixes corresponding to multiple preset relationship types respectively based on the dependency information.

In practical applications, the dependency weights of words in different relationship spaces are different. To more flexibly predict overlapping triples, embodiments of the present invention provide a node-aware attention mechanism to dynamically learn the strength of correlation between different words in each relationship space in an end-to-end manner. The original dependency tree is converted into a plurality of fully connected graphs, wherein each graph contains semantic information under a different relationship space. Specifically, first, a fully-connected layer is run on the output of the Bi-GCN in the first stage to obtain weight-dependent initial hidden representation information:

S＝UW_a+b_a

wherein S is weight-dependent initial hidden representation information,

and

is the parameter that the model needs to learn, d_uAnd d_aThe hidden state dimension of sentence feature U and the input dimension in S, respectively.

Since the feature semantic information required to learn each relationship space is different, separate computation of dependency information for word nodes is required for different relationship types. Projecting the feature representation of the sentence to feature subspaces of different relation types, wherein the specific formula is as follows:

wherein the content of the first and second substances,

and

a query vector and a key-value vector representing the mth relationship category. N represents the length of the input sentence;

and

are parameters of the model. d_rIs the dimension of each relationship space.

Then, a dependency adaptive adjacency matrix under each relation type can be constructed

Wherein the content of the first and second substances,

and representing the strength of the dependency relationship between the ith node and the jth node under the relationship type m.

The construction method of the adjacency matrix under the specific relation is as follows:

wherein d is_rIs the dimension of each relationship space; t is a transposed matrix.

Notably, the number of constructed adaptive adjacency matrices is consistent with the number of relationship types in the dataset.

Step 306, extracting second node characteristic representation information of the word according to the self-adaptive adjacent matrix and the first node characteristic representation information;

in an embodiment of the present invention, step 306 may include the following sub-steps:

s61, calculating forward characteristic representation information and backward characteristic representation information of the word by adopting the self-adaptive adjacent matrix and the first node characteristic representation information;

s62, second node feature representing information of the word is generated using the forward feature representing information and the backward feature representing information.

In a specific implementation, after obtaining the adaptive adjacency matrix for each predefined relationship, the node feature information extracted by the first BiGCN may be used as the initial sentence input information of the second-stage BiGCN. Unlike the first BiGCN feature extractor, in the BiGCN of the second stage, the feature information of sentences is mapped to different relationship feature spaces, and the dependent association strength between nodes is dynamically learned by using an adaptive adjacency matrix, which is helpful for extracting the information of overlapping triples.

In addition, the BiGCN feature extractor at the two stages can also establish information interaction between named entities and relationships, so that the model can extract all triples in a sentence to the maximum extent. The second stage BiGCN feature extractor operates as follows:

wherein the content of the first and second substances,

representing the dependency relationship strength of the node i and the node j under the mth relationship,

is a characteristic representation of the l-1 layer of the node j under m relations, the initial input is the output of the first BiGCN

Forward characteristic representation information of the BiGCN node u corresponding to the mth relation;

is the backward characteristic representation information of the BiGCN node u corresponding to the mth relationship; as with the BiGCN of the first stage, both incoming and outgoing information of the node is taken into account.

The second node feature representation information is a word as a node.

It should be noted that, in the BiGCN feature extractor in the second stage, the node dependency relationship information extracted by the BiGCN in all relationship spaces and the first node feature representation information extracted by the BiGCN in the first stage are fused, and then named entity identification and classification of node relationships are performed again. Because the node dependency information under different relation spaces is merged, all appearing entities and relations in the text can be extracted to the greatest extent.

Step 307, determining an entity type of each word based on the second node characteristic representation information of each word;

and step 308, calculating the relation category between any two words based on the second node characteristic representation information of each word.

Wherein, for entity recognition, the first feature representation information and the second feature representation information of each word as a node are delivered to a linear connection layer, and then sequence labels of the text are obtained using a softmax function:

wherein, W_eAnd b_eIs a parameter of the model, y is a true tag, e_iIs the ith node, L_eIs a loss function; s represents the input sentence sequence; n represents the length of a word in a sentence; e.g. of the type_iRepresents the ith word;

representing the category of the i-th word prediction.

From the sequence tags, the entity type of the word can be identified.

For the relationship extraction, the node feature representation information of two arbitrary words can be output to different linear connection layers respectively, and then the relationship category of the two words can be obtained by using softmax.

P_r(r|e_i,e_j,s)＝σ(S(e_i,r,e_j))

Wherein the content of the first and second substances,

and

is a parameter of the model, S (e)_i,r,e_j) Representing the scores of two words under the relationship r, note that S (e)_i,r,e_j) Is different from S (e)_j,r,e_i)。

For easy understanding, the embodiment of the present invention has been performed on two public data sets NYT and WebNLG, and the sentences in the data sets and the distribution of the triples are shown in table 1. The results of the experiment are shown in table 2. Compared with the most advanced model extracted in a combined way, the experimental result shows that the method provided by the invention is improved by 6.5% and 11.4% on the NYT data set and the WebNLG data set.

TABLE 1

TABLE 2

Table 3 shows the effect of embodiments of the present invention and recognition of NYR, WebNLG entities (F1 value). By comparison, the inventive example was improved by 1.8% and 4.9% on both data sets. The method can be used for accurately identifying the entities in the text, so that the effect of extracting the triples can be greatly improved.

Method	NYT	WebNLG
			GraphRel	0.892	0.919
AntNRE	0.925	0.916
			Examples of the invention	0.943	0.965

TABLE 3

Table 4 shows three examples of the extraction results on the NYT dataset, with three representative sentences selected as a display from the Normal, SEO and EPO test sets, respectively. The first sentence contains only one triplet and can therefore be easily identified. The second statement has two triplets that share one entity. By extracting the characteristic information of all words in the sentence, the potential connection between the two triples can be deduced, so that all the triples in the sentence are extracted. The third sentence is a double-entity overlap type, and since the embodiment of the present invention cannot completely extract all double-entity triples, one triplet is omitted.

TABLE 4

According to the experimental result, the embodiment of the invention improves the recognition effect of the entity type of the word and the relation category between different words.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An entity and relationship extraction method, comprising:

constructing adaptive adjacency matrixes of various preset relation types;

2. The method according to claim 1, wherein the step of extracting multi-granularity feature representation information of each word in the preset text comprises:

3. The method of claim 1, wherein the step of extracting the first node feature representation information of the word based on the multi-granularity feature representation information comprises:

creating an adjacency matrix for the word;

4. The method of claim 1, wherein the step of constructing the adaptive adjacency matrices for the plurality of preset-relationship types includes:

obtaining sentence characteristics of sentences in which each word is located;

5. The method of claim 4, wherein the step of extracting second node feature representation information of the word from the adaptive adjacency matrix and the first node feature representation information comprises:

6. The method of claim 4, wherein the dependency information comprises a query vector and a key-value vector.

7. The method of claim 2, wherein said step of extracting character-level word features and word-level part-of-speech features of each of said words comprises: