CN115934944A

CN115934944A - Entity relation extraction method based on Graph-MLP and adjacent contrast loss

Info

Publication number: CN115934944A
Application number: CN202211594439.2A
Authority: CN
Inventors: 吴涛; 游小琳; 先兴平; 宋秀丽; 姜丰; 徐敖远; 张浩然
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2023-04-07

Abstract

The invention relates to an entity relation extraction method based on Graph-MLP and adjacent contrast loss, which comprises the following steps: acquiring training text data with label information; performing word segmentation processing on the training text according to the word list; embedding words in the training text into vectors to represent to obtain word sequence vectors of the training text; inputting the word sequence vector of the training text into a Bi-LSTM and extracting to obtain the context semantic feature representation of the training text; creating a Graph-MLP relation classification model; taking the context semantic feature expression of the training text as a training sample to train the Graph-MLP relation classification model; and acquiring a word sequence vector of the target text, inputting the word sequence vector of the target text into a trained Graph-MLP relation classification model, and outputting the relation between two entities in the target text.

Description

Entity relation extraction method based on Graph-MLP and adjacent contrast loss

Technical Field

The invention belongs to the field of natural language processing, and particularly relates to an entity relation extraction method based on Graph-MLP and adjacent contrast loss.

Background

With the rapid development of internet technology, network data in the internet grows exponentially. Massive network data implies rich and important information. The relation extraction technology in the field of natural language processing aims at automatically extracting semantic relations in texts by modeling text information, extracting effective semantic knowledge and facilitating effective storage and screening of data by subsequent tasks. The task of relationship extraction is to extract the relationships between pairs of entities in natural language text. The relationship of an entity pair can be formally described as a relationship triple < e ₁ ,r,e ₂ Is where e ₁ And e ₂ Is an entity, R belongs to a target relationship set R { R } ₁ ,r ₂ ,...,r _M One of them. Successful relation extraction is a foundation for large-scale relation understanding of unstructured texts, and research results of the successful relation extraction are mainly applied to the fields of information retrieval, automatic question answering, knowledge map construction and the like. At present, graphical Neural Networks (GNNs) have been increasingly used to solve the problem of relationship extraction, with advanced results. The method is mainly characterized in that topological structure information and attribute feature information in graph data are integrated by utilizing a graph neural network model, and further more refined feature representation of nodes or substructures is provided.

However, the conventional GNN-based relational extraction model often requires an additional language tool to convert the sequence text into graph-structured data, so as to be used as an input form of GNN. Resulting in computationally expensive and not end-to-end tasks during relational extraction data processing. Meanwhile, the traditional GNN-based relation extraction model mainly utilizes neighborhood information to realize message transmission among nodes, so that the structural information of the graph is clearly learned in the feedforward neural network. Such complex messaging often results in complex and heavy computations, which are also a major source of delays generated by GNNs in large-scale graph structures, reducing the speed of entity relationship extraction, and making GNNs difficult to deploy in large-scale industrial applications requiring rapid inference and complex structures.

Disclosure of Invention

In order to solve the problems in the background art, the invention provides an entity relation extraction method based on Graph-MLP and adjacent contrast loss, which comprises the following steps:

s1: acquiring training text data with label information; the tag information includes: training the relationship category between two entities in the text;

s2: performing word segmentation processing on the training text according to the word list; embedding words in the training text into a vector by adopting a GloVe model to represent to obtain a word sequence vector of the training text;

s3, inputting the word sequence vector of the training text into the Bi-LSTM and extracting to obtain the context semantic feature representation of the training text;

s4, establishing a Graph-MLP relation classification model; taking the context semantic feature expression of the training text as a training sample to train the Graph-MLP relation classification model; wherein the Graph-MLP relationship classification model comprises: relu activation function, MLP and softmax activation function;

and S5, obtaining a word sequence vector of the target text, inputting the word sequence vector of the target text into the trained Graph-MLP relation classification model, and outputting the relation between two entities in the target text.

The invention has at least the following beneficial effects:

the invention adopts simple and light MLP to replace the aggregation operation in GCN, does not need to explicitly transmit neighborhood node information through a Graph structure, achieves message transmission and simultaneously improves the calculation efficiency, and the Graph-MLP classification model can realize performance equivalent to that of the Graph model and is more efficient.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and embodiments may be combined with each other without conflict.

Referring to fig. 1, the present invention provides a method for extracting an entity relationship based on Graph-MLP and adjacent contrast loss, comprising:

in the invention, a data set TACRED containing 106264 examples is obtained, wherein the data set TACRED comprises the following steps: the training set 68124, the verification set 22631, and the 15509 test sets. The data set contains 41 types of relationship categories and a special "no-relation".

firstly, preprocessing data, wherein the preprocessing comprises the following steps: loading original data, cleaning data, mapping a word token into a digital index according to a word list, and mapping various types of marking information into a corresponding numerical value list according to different rules. Sentence information is converted into a numerical form through a preprocessing step and serves as input of the embedded model.

Word embedding is intended to convert words in text into a numerical representation, discrete words into continuous word vectors using a pre-trained GloVe model, where each word is represented by a 300-dimensional real-valued vector.

Performing word segmentation according to the vocabulary to obtain n words X = (X) ₁ ,x ₂ ,…,x _n ) The n words are embedded into the vector to obtain a word sequence vector E = (E) of the training text ₁ ,e ₂ ,…,e _n )。

Each training text comprises two entities; the two entities form an entity pair; the two entities are at different locations in the text; both entities are nouns; for example, the training text is "the registered address of company a is located in region B", where sta is entity 1, region B is entity 2, and the relationship between entity 1 and entity 2 is a location relationship.

S2: inputting the word sequence vector of the training text into Bi-LSTM and extracting to obtain the context semantic feature representation of the training text;

further capture of the contextual representation of the text using a Bi-LSTM model with the input of Bi-LSTM E = (E) ₁ ,e ₂ ,…,e _n ) I.e. for each item e ₁ ,e ₂ ,…,e _n Sequentially inputting the data into a Bi-LSTM layer to calculate the forward and reverse outputs of the Bi-LSTM at the time n, respectively

And &>

Adding the forward output and the reverse output point by point to obtain a result h _n The context semantic feature of the training text after Bi-LSTM layer extraction and combination is represented as H = (H) ₁ ,h ₂ ,…,h _n )。

S4: creating a Graph-MLP relation classification model; taking the context semantic feature expression of the training text as a training sample to train the Graph-MLP relation classification model; wherein the Graph-MLP relationship classification model comprises: relu activation function, MLP and softmax activation function;

s41: inputting the context semantic feature representation of the training text into MLP to calculate to obtain a feature representation vector of the training text;

preferably, the obtaining of the feature representation vector of the training text by inputting the context semantic feature representation of the training text into MLP includes:

H ^(l) ＝Dropout(LN(σ(HW ^(l) )))

where σ denotes the Relu activation function, W ^(l) A weight parameter matrix representing the L < th > layer of the MLP, H represents the context semantic feature representation of the training text, H ^(l) A feature representation vector representing the l-layer of the training text.

In a multi-layer perceptron of l layers, a context semantic feature representation of a sentence is received as an input, and a linear change is applied to update the feature representation in the sentence. The nonlinear activation function then mathematically transforms the feature vectors to learn different nonlinear relationships, then combines LayerNorm for stability training, and finally feeds the output into Dropout to avoid overfitting.

S42: inputting the feature expression vector of the training text into a softmax activation function to predict the relationship between entity pairs in the training text;

preferably, the step of inputting the feature expression vector of the training text into the softmax activation function to predict the relationship between the entity pairs in the training text comprises the following steps:

O＝W ^o H ^(l) +b

wherein, W ^o Weight matrix for softmax activation function, b bias term for softmax activation function, H ^(l) Feature representation vector, P (c | H), representing training text ^(l) ) Representing the probability, o, that a relationship between a pair of entities belongs to a class c _r And o _k Represents the R-th element and the k-th element in O, M is the number of relation categories, H ^(l) A feature representation vector representing the training text.

S43: constructing a loss function of the Graph-MLP relational classification model by using a multi-head attention mechanism, a Relu activation function, a cross entropy loss algorithm and an adjacent contrast loss algorithm according to the label information of the training text and the relation prediction result between entity pairs in the training text, and updating parameters of the MLP and softmax activation functions through a back propagation mechanism to complete the training of the Graph-MLP relational classification model;

preferably, the process of constructing the loss function of the Graph-MLP relationship classification model includes:

s441: according to a multi-head attention mechanism, the context semantic feature expression of the training text is respectively expressed with a trainable parameter matrix W ^Q And W ^K Multiplying to obtain the query Q and the key K, where W ^Q Or W ^K ∈R ^d×d D represents the dimension of the trainable parameter matrix;

s442: calculating the vector dot product of the training text by using a Relu activation function according to the query Q and the key K;

preferably, the vector dot product of the training text comprises:

wherein M is _t The vector dot product of the training text is represented, the dimension of the feature vector is represented by d, the ReLU represents an activation function, the T represents transposition, the T represents the number of multi-head attention heads, and the value of the T is usually set to be 3.

S443: calculating to obtain a weighting matrix of the training text according to the vector dot product of the training text;

preferably, the weighting matrix of the training text comprises:

wherein A is _t Representing a weighting matrix of the training text, d is the dimension of the feature vector, u represents word nodes in the training text, num represents a set of word nodes in the training text, and t represents the number of the multi-head attention heads.

S444: constructing a first loss function by using an adjacent contrast loss algorithm according to the weighting matrix of the training text;

preferably, the first loss function comprises:

among them, loss _NC Representing a first loss function, sim (-) representing a similarity function, N representing the number of word nodes in the training text, v _i Representing the ith word node, v _j Represents the jth word node, a _ij Is represented by A _t The value of the ith row and the jth column simultaneously represents the word node v _i And word node v _j And theta represents a preset threshold value.

S445: constructing a second loss function by using a cross entropy loss algorithm according to the label information of the training text and the relationship prediction result between the entity pairs in the training text;

preferably, the second loss function comprises:

therein, loss _cooss-entro Representing a second loss function, N representing the number of word nodes in the training text, M being the number of relationship classes, y _ic And p (c | H) represents a relation prediction result between the entity pairs in the training text.

S446: obtaining a loss function of the Graph-MLP relation classification model according to the first loss function and the second loss function;

preferably, the loss function of the Graph-MLP relationship classification model comprises:

loss _final ＝loss _NC +loss _{cooss-entropy}

therein, loss _NC Representing a first loss function, loss _{cooss-entropy} Representing the second loss function, loss _final Representing the loss function of the Graph-MLP relational classification model.

S4: and acquiring a word sequence vector of the target text, inputting the word sequence vector of the target text into a trained Graph-MLP relation classification model, and outputting the relation between two entities in the target text.

Finally, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An entity relation extraction method based on Graph-MLP and adjacent contrast loss is characterized by comprising the following steps:

s3: inputting the word sequence vector of the training text into Bi-LSTM and extracting to obtain the context semantic feature representation of the training text;

s5: and acquiring a word sequence vector of the target text, inputting the word sequence vector of the target text into a trained Graph-MLP relation classification model, and outputting the relation between two entities in the target text.

2. The method of claim 1, wherein the training of the Graph-MLP relational classification model using the context semantic feature representation of the training text as the training samples comprises:

s43: and constructing a loss function of the Graph-MLP relational classification model by using a multi-head attention mechanism, a Relu activation function, a cross entropy loss algorithm and an adjacent contrast loss algorithm according to the label information of the training text and the relation prediction result between entity pairs in the training text, and updating parameters of the MLP and softmax activation functions through a back propagation mechanism to finish the training of the Graph-MLP relational classification model.

3. The method for extracting the entity relationship between the Graph-MLP and the adjacent contrast loss according to claim 2, wherein the step of inputting the feature representation vector of the training text into the softmax activation function to predict the relationship between the entity pairs in the training text comprises the steps of:

O＝W ^o H ^(l) +b

wherein, W ^o Weight matrix for softmax activation function, b bias term for softmax activation function, P (c | H) ^(l) ) Representing the probability, o, that a relationship between pairs of entities in the training text belongs to class c _r And o _k Represents the R-th element and the k-th element in O, M is the number of relation categories, H ^(l) A feature representation vector representing the training text.

4. The method according to claim 2, wherein the construction process of the loss function of the Graph-MLP relational classification model comprises:

s441: according to a multi-head attention mechanism, the context semantic feature expression of the training text is respectively expressed with a trainable parameter matrix W ^Q And W ^K Multiplying to obtain query Q and key K, where W ^Q Or W ^K ∈R ^d×d D represents the dimension of the trainable parameter matrix;

s444: constructing a first loss function by using an adjacent contrast loss algorithm according to a weighting matrix of the training text;

s446: and obtaining a loss function of the Graph-MLP relation classification model according to the first loss function and the second loss function.

5. The method according to claim 4, wherein the extracting the entity relationship between Graph-MLP and adjacent contrast loss comprises:

wherein M is _t The vector dot product of the training text is represented, the dimension of the feature vector is represented by d, the ReLU represents an activation function, the T represents transposition, and the T represents the number of multi-head attention heads.

6. The method as claimed in claim 4, wherein the weighting matrix of the training text comprises:

7. The method according to claim 4, wherein the first loss function comprises:

therein, loss _NC Representing a first loss function, sim (-) representing a similarity function, N representing the number of word nodes in the training text, v _i Representing the ith word node, v _j Represents the jth word node, a _ij Is represented by A _t The value of the ith row and the jth column in the Chinese character string simultaneously represents a word node v _i And word node v _j And theta represents a preset threshold value.

8. The method of claim 4, wherein the second loss function comprises:

therein, loss _{cooooss-entrop} Representing a second loss function, N representing the number of word nodes in the training text, M being the number of relation classes, y _ic And p (c | H) represents a relation prediction result between the entity pairs in the training text.

9. The method according to claim 4, wherein the loss function of the Graph-MLP relational classification model comprises:

loss _final ＝loss _NC +loss _{cooss-entropy}

therein, loss _NC Representing the first loss function, loss _cooss-entro Representing the second loss function, loss _final Representing the loss function of the Graph-MLP relational classification model.