CN113076391B

CN113076391B - Remote supervision relation extraction method based on multi-layer attention mechanism

Info

Publication number: CN113076391B
Application number: CN202110453297.7A
Authority: CN
Inventors: 刘琼昕; 王佳升
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-01-27
Filing date: 2021-04-26
Publication date: 2022-09-20
Anticipated expiration: 2041-04-26
Also published as: CN113076391A

Abstract

The invention relates to a remote supervision relation extraction method based on a multilayer attention mechanism, and belongs to the technical field of artificial intelligence and natural language processing. The method aims to solve the technical problems that an existing remote supervision relation extraction method is poor in extraction effect in a noise environment, noise processing of sentence sub-packets is omitted, entities in sentences repeatedly appear, and the like, and aims at the situation that the entity pairs in the sentences repeatedly appear, a relative position characteristic attention is designed, and position information of the entity pairs is fully utilized. Aiming at the problem of sentence packet noise, the confidence coefficient of each sentence packet is calculated, and the sentence packets with the same relation and high noise and the sentence packets with low noise are combined into packet groups according to the confidence coefficient, so that the noise among the packet groups is relatively balanced, and the effect of relation extraction in a noise environment is improved.

Description

Remote supervision relation extraction method based on multi-layer attention mechanism

Technical Field

The invention relates to a remote supervision relation extraction method, in particular to a remote supervision relation extraction method based on a multilayer attention mechanism, and belongs to the technical field of artificial intelligence and natural language processing.

Background

Information Extraction (Information Extraction), which is a basic task in natural language processing, extracts structured Information by processing unstructured text as input for a subsequent natural language processing task. In the era of knowledge explosion, people need to face mass data every day, and it becomes very important to efficiently process texts and extract useful information by reasonably utilizing an information extraction system.

Information extraction, which is a very important ring in natural language processing, is itself composed of a series of subtasks, such as named entity recognition, relationship extraction, and event extraction. Among them, the Relation Extraction (RE) is a key technology of information Extraction, which aims to mine semantic relations existing between entities and has a very important meaning for the fields of automatic knowledge base construction, question-answering systems, and the like.

Remote Supervision relationship Extraction (DSRE) is a mainstream relationship Extraction method that labels corpora by using an external knowledge base instead of manpower, can obtain a large amount of labeled data at low cost, and is currently used.

In the knowledge base, there are a large number of triples in the form of "entity 1, entity 2, entity 1, 2 relation", and remote supervision aligns an unlabeled corpus to a known knowledge base, where each triplet in the knowledge base corresponds to a set of sentences, called a Bag (Bag). The sentences in the package are all obtained from the unmarked corpus, and the sentences in the same package all contain the same entity pair (i.e. two entities in the corresponding triple). Remote supervised relationship extraction hypothesis package each sentence has a relationship corresponding to an entity pair, which results in a large amount of labeled data while introducing a large amount of noise, i.e., incorrectly labeled data. In addition, in order to better capture the influence of the words in the sentence on the entity pairs, relative position features are applied to the relationship extraction model, and the relative position features refer to the relative distance between each word in the sentence and the entity pairs.

However, the existing remote supervision relation extraction method ignores the noise of sentence packets, and meanwhile, the technical problems that entity pairs repeatedly appear in sentences exist.

Disclosure of Invention

The invention aims to solve the technical problems that the existing remote supervision relation extraction method has poor extraction effect in a noise environment, ignores noise processing on sentence sub-packets, and repeatedly appears entities in sentences, and creatively provides a remote supervision relation extraction method based on a multi-layer attention mechanism.

The method has the innovativeness that: aiming at the repeated occurrence of the entity pairs in the sentences, a relative position feature attention is designed, and the position information of the entity pairs is fully utilized. Aiming at the problem of sentence packet noise, the confidence coefficient of each sentence packet is calculated, and the sentence packets with the same relation and high noise and the sentence packets with low noise are combined into packet groups according to the confidence coefficient, so that the noise among the packet groups is relatively balanced.

The technical scheme adopted by the method is as follows:

a remote supervision relation extraction method based on a multi-layer attention mechanism comprises the following steps:

s1: and acquiring a knowledge base, and dividing sentences in the knowledge data set according to packets.

S2: a matrix representation of the sentence is obtained.

S3: through a sentence encoder, a feature vector of a sentence is obtained.

S4: by the attention of the position feature, the weight vector representation of the sentence is obtained.

S5: by sentence attention, a vector representation of the package is obtained.

S6: the packets of the data set are combined into packet pairs.

S7: a vector representation of each packet pair is obtained.

S8: the loss value of the packet pair is obtained.

S9: the model parameters are updated by back propagation and gradient descent.

S10: and predicting the relation of the packets aiming at the unlabeled packets according to the trained model to obtain new triple knowledge, thereby mining the semantic information of the sentences in the packets.

Advantageous effects

Compared with the prior art, the method of the invention has the following advantages:

1. the invention considers the repeated appearance of the entity in the sentence, calculates the position characteristic of the sentence aiming at the position of each group of entity pairs, designs the attention of the position characteristic and fully utilizes the position information of the entity in the sentence.

2. The invention carries out noise reduction on the level of the sentence packets, combines the packet groups with balanced noise according to the confidence of the sentence packets, and improves the effect of relation extraction in a noise environment.

Drawings

FIG. 1 is an overall block diagram of the method;

FIG. 2 is a block diagram of a PCNN;

FIG. 3 is a graph comparing Precision/Recall curves for the method of the present invention with existing methods.

Detailed Description

The method of the present invention is further described in detail below with reference to the drawings and examples. The embodiment elaborates the method and the effect of the method when the method is specifically implemented under the condition of relational extraction of the mainstream data set.

As shown in fig. 1, a remote supervised relationship extracting method based on a multi-layer attention mechanism includes the following steps:

step 1: and acquiring a knowledge base, and dividing sentences in the knowledge data set according to packets.

Specifically, sentences in the knowledge data set are divided into packets according to corresponding entity pairs, so that the sentences in the packets have the same entity pair, and the corresponding relationship of the entity pair is assigned to each sentence.

Step 2: a matrix representation of the sentence is obtained.

The method comprises the following specific steps:

first, let the ith packet in the data set

Wherein the content of the first and second substances,

presentation bag B ⁱ The last sentence of (1); order bag B ⁱ The jth sentence in China

Wherein l _ij Representing sentences

The number of words of (a); order sentence

The head entity is positioned as

Wherein the content of the first and second substances,

the number of occurrences of the head entity; order sentence

The position of the tail entity is

Wherein the content of the first and second substances,

the number of occurrences of the tail entity.

Then, the head entity position and the tail entity position are combined to obtain

Wherein

For sentences

Each word in

All the position characteristics are calculated by using the Embedding technology:

wherein the content of the first and second substances,

representing words

Relative to the head entity

A feature vector of a position is generated,

representing words

Relative to the tail entity

A position feature vector is generated by the position feature vector,

representing the real number intra-domain dimension as d _p The vector space of (c).

Calculating the words by formula (1)

Is/are as follows

Set of medium final vector representations

Wherein:

wherein the content of the first and second substances,

representation collection

The first vector of (a) is,

is a word

The embedded word vector of (a) is,

representing the real number intra-domain dimension as d _w Of (c) is determined. d represents the final vector representation dimension, d ═ d _w +2d _p Wherein d is _w Dimension representing embedded word vector, d _p Representing the dimensions of the location feature vector.

Calculating a sentence by equation (2)

Set of matrix representations

Wherein:

wherein the content of the first and second substances,

representation collection

The first matrix of (a) is,

representing sentences

1 ≦ k ≦ l _ij ,

l _ij As sentences

The number of words in (c).

And 3, step 3: through the sentence encoder, a feature vector of a sentence is obtained.

Specifically, as shown in fig. 2, the method comprises the following steps:

for sentences

Each matrix representation of

Using a kernel containing m convolution kernels { f ₁ ，f ₂ ，...，f _m The segmented Convolutional Neural Network (PCNN) of X, which is obtained by the vector representation of X. Wherein f is _m Representing the m-th convolution kernel, each convolution kernel

1≤i≤m，

Representing the matrix space of size l x k in the real domain, l representing the length of the convolution kernel and k representing the width of the convolution kernel.

Representing a set of matrices

The first matrix of (a) is,

first, convolution features c are extracted by m convolution kernels _ij ：

c _ij ＝f _i *w _j-l+1：j (3)

Wherein i is more than or equal to 1 and less than or equal to m, j is more than or equal to 1 and less than or equal to | X | + l-1, w _j-l+1：j Representing a matrix formed by corresponding vectors from the j-l +1 th row to the j-th row of the matrix X, | X | represents the row number of the matrix X; is a convolution operation. After convolution, a matrix is obtained

Then, for each convolution kernel f _i Corresponding convolution result c _i Divided into three parts by two physical positions in the sentence { c _i1 ，c _i2 ，c _i3 And the parts from the head entity to the head entity, the parts from the head entity to the tail entity and the parts from the tail entity to the tail of the sentence are respectively, and the maximum pooling operation of the segments is carried out:

p _ij ＝max(c _ij ) (4)

wherein i is more than or equal to 1 and less than or equal to m, and j is more than or equal to 1 and less than or equal to 3.

Each convolution kernel f _i Corresponding to a 3-dimensional vector p _i ＝{p _i1 ，p _i2 ，p _i3 }. All convolution cores are spliced with response vectors to obtain vectors

By means of the tanh function, the final vector representation of the matrix X is obtained:

s＝tanh(p _1：m ) (5)

wherein the content of the first and second substances,

d _s ＝3m，d _s representing the dimensions of the final vector representation of matrix X. At this point, the vector representation to which sentence S corresponds is independent of its length.

For sentences

Is/are as follows

A matrix representation

Through the formulas (3), (4) and (5), the method is obtained

A corresponding feature vector

And 4, step 4: by the attention of the position feature, the weight vector representation of the sentence is obtained.

In particular, the amount of the solvent to be used,

for sentences

Obtaining the weighted feature vector thereof by the formula (6)

Wherein alpha is _k Is composed of

Is calculated by equation (7):

wherein e is _k Representing a vector

And sentences

Corresponding relationship of

Matching score of e _l Representing sentences

Feature vector set of

The first vector of

And sentences

Corresponding relationship of

The matching score of (2). e.g. of the type _k Calculated by equation (8):

wherein the content of the first and second substances,

as sentences

Corresponding relationship of

The embedded vector obtained by the Embedding technique.

And 5: by sentence attention, a vector representation of the packet is obtained.

In particular, the amount of the solvent to be used,

for bag

B is calculated by the formula (9) ⁱ Is represented by the vector of (A) ⁱ ：

Wherein, beta _j Is that

Weight of (c), B ⁱ I represents packet B ⁱ The number of sentences in the sentence. Beta is a _j Calculated by equation (10):

wherein, g _j To represent

And bag B ⁱ Corresponding relationship of

Matching score of g _l Represents a packet | B ⁱ I the first sentence

And bag B ⁱ Corresponding relationship of

The matching score of (2). g _j Calculated by equation (11).

Wherein the content of the first and second substances,

is a bag B ⁱ Corresponding relationship of

The embedded vector obtained by the Embedding technique.

Step 6: the packets of the data set are combined into packet pairs.

Specifically, the method comprises the following steps:

step 6.1: for all sentence packets in the data set B ¹ ，B ² ，...，B ^N N is the number of sentence packets in the dataset, and the set of relationships R ═ R ₁ ，r ₂ ，...，r _|R| The packets with the same relation r are combined into a packet group

Wherein n represents G _r The number of middle packets.

Step 6.2: by equation (12), packet group G is calculated _r The ith packet in (1)

Degree of confidence c _i ：

Wherein the content of the first and second substances,

is a bag

And sentence bag group G _r Is determined by the matching score of the corresponding relation r,

is a bag

And relation set R ═ R ₁ ，r ₂ ，...，r _|R| Each inThe matching score of the individual relationships. For each relation R in R _k ，

Calculated by equation (13):

step 6.3: the packet pairs are combined according to their confidence levels.

Specifically, the method comprises the following steps:

step 6.3.1: grouping packets in order of increasing confidence of packets

Inner packet is reordered to obtain

Step 6.3.2: according to the head-to-tail combination mode, obtaining packet pairs

Wherein the content of the first and second substances,

presentation package group G _r To (1) a

The number of the packets is one,

representing a rounding down operation of the division.

And 7: a vector representation of each packet pair is obtained.

In particular, the amount of the solvent to be used,

for each packet pair

Computing its vector representation by equation (14)

Wherein the content of the first and second substances,

is a bag

The weight of (b) is calculated by the following formulas (15) and (16):

wherein r represents a packet group G _r The corresponding relationship of (1).

And 8: the loss value of the packet pair is obtained.

Specifically, the method comprises the following steps:

step 8.1: computing relationship set R in package pairs

Predicted score vector o:

wherein the content of the first and second substances,

is a weight matrix, | R | is the total number of relationships, k is the embedded vector dimension of each relationship in the set of relationships R,

the offset is represented by the number of bits in the bit,

bag pair

Corresponding relation r _i Is predicted to score

Step 8.2: r is calculated using Softmax _i Probability of (P):

where θ is a training parameter in the relationship extractor, o _j Representing the jth relation in the relation set in the package pair

The predicted score of (a) is calculated,

represents the relation r _i In bag pair

N is the number of relationships in the relationship set R.

Step 8.3: obtaining a packet pair by using a cross entropy loss function

Corresponding loss value L:

wherein P represents r _i The probability of (c).

And step 9: model parameters are updated by back propagation and gradient descent.

Step 10: and predicting the relation of the packets aiming at the unlabeled packets according to the trained model to obtain new triple knowledge, thereby mining the semantic information of the sentences in the packets.

Examples experimental verification

The experimental comparison results of the method of the invention on the NYT data set with various reference methods are shown in tables 1 and 2, and the method of the invention achieves greater effect improvement compared with the baseline method on the P @ N index and the AUC value. In addition, as can be seen from fig. 3, the relationship extraction effect of the method of the present invention is superior to that of the existing relationship extraction method with better effect.

TABLE 1 Baseline method and P @ N index for the method of the invention

TABLE 2 AUC index for baseline and inventive methods

Claims

1. A remote supervision relation extraction method based on a multi-layer attention mechanism is characterized by comprising the following steps:

s1: acquiring a knowledge base, and dividing sentences in a knowledge data set according to packets;

s2: obtaining a matrix representation of the sentence:

first, let the ith packet in the data set

Wherein the content of the first and second substances,

Wherein l _ij Representing sentences

The number of words of (a); order sentence

The head entity is positioned as

Wherein the content of the first and second substances,

the number of occurrences of the head entity; order sentence

The position of the tail entity is

Wherein the content of the first and second substances,

the number of occurrences of the tail entity;

Wherein

For sentences

Each word in

wherein the content of the first and second substances,

representing words

Relative to the head entity

A position feature vector is generated by the position feature vector,

representing words

Relative to the tail entity

A position feature vector is generated by the position feature vector,

representing the real number intra-domain dimension as d _p The vector space of (a);

calculating the words by formula (1)

Is/are as follows

Set of medium final vector representations

Wherein:

wherein the content of the first and second substances,

representation collection

The first vector of (a) is,

is a word

The embedded word vector of (a) is,

representing the real number intra-domain dimension as d _w D denotes the final vector representation dimension, d ═ d _w +2d _p Wherein d is _w Dimension representing embedded word vector, d _p A dimension representing a location feature vector;

calculating a sentence by equation (2)

Set of matrix representations

Wherein:

wherein, the first and the second end of the pipe are connected with each other,

representation collection

The first matrix of (a) is,

representing sentences

1 ≦ k ≦ l _ij ,

l _ij As sentences

The number of words of (c);

s3: acquiring a feature vector of a sentence through a sentence encoder;

for sentences

Each matrix representation of

Using a kernel containing m convolution kernels { f ₁ ，f ₂ ，...，f _m The segmented convolutional neural network of (f) gets the vector representation of X, where f _m Representing the m-th convolution kernel, each convolution kernel

1≤i≤m，

Representing the matrix space of size l x k in the real domain, l representing the length of the convolution kernel, k representing the width of the convolution kernel,

representing a set of matrices

The first matrix of (a) is,

first, convolution feature c is extracted by m convolution kernels _ij ：

c _ij ＝f _i *w _j-l+1：j (3)

Wherein i is more than or equal to 1 and less than or equal to m, j is more than or equal to 1 and less than or equal to | X | + l-1, w _j-l+1：j Representing a matrix formed by corresponding vectors from the j-l +1 th row to the j-th row of the matrix X, | X | represents the row number of the matrix X; is a convolution operation; after convolution, a matrix is obtained

Then, for each convolution kernel f _i Corresponding convolution result c _i Divided into three parts { c ] according to two entity positions in the sentence _i1 ，c _i2 ，c _i3 And the parts from the head entity to the head entity, the parts from the head entity to the tail entity and the parts from the tail entity to the tail of the sentence are respectively, and the maximum pooling operation of the segments is carried out:

p _ij ＝max(c _ij ) (4)

wherein i is more than or equal to 1 and less than or equal to m, and j is more than or equal to 1 and less than or equal to 3;

each convolution kernel f _i Corresponding to a 3-dimensional vector p _i ＝{p _i1 ，p _i2 ，p _i3 }; all convolutionsObtaining the vector by matching the corresponding vectors

s＝tanh(p _1：m ) (5)

wherein the content of the first and second substances,

d _s ＝3m，d _s representing the dimension of the final vector representation of the matrix X, wherein the vector representation corresponding to the sentence S is irrelevant to the length of the sentence S;

for sentences

Is/are as follows

A matrix representation

Through the formulas (3), (4) and (5), the method is obtained

A corresponding feature vector

S4: by the attention of the position feature, the weight vector representation of the sentence is obtained:

for sentences

Obtaining the weighted feature vector thereof by the formula (6)

Wherein alpha is _k Is composed of

Is calculated by equation (7):

wherein e is _k Representing a vector

And sentences

Corresponding relationship of

Matching score of e _l Representing sentences

Feature vector set of

The first vector of

And sentences

Corresponding relationship of

The matching score of (2); e.g. of the type _k Calculated by equation (8):

wherein the content of the first and second substances,

as sentences

Corresponding relationship of

Obtaining an embedded vector through an Embedding technology;

s5: obtaining a vector representation of the package through sentence attention;

s6: combining the packets of the data set into packet pairs;

s7: obtaining a vector representation for each packet pair;

s8: obtaining a loss value of the packet pair;

s9: updating model parameters through back propagation and gradient descent;

2. The method of claim 1, wherein the step S1 is implemented by dividing sentences in the knowledge data set into packets according to corresponding entity pairs, making the sentences in the packets have the same entity pair, and assigning a relationship corresponding to the entity pair to each sentence.

3. The method for extracting remote supervised relationship based on multi-layer attention mechanism as recited in claim 1, wherein the step S5 includes the following steps:

for bag

Wherein beta is _j Is that

Weight of (c), B ⁱ I represents packet B ⁱ The number of sentences of (1); beta is a _j Calculated by equation (10):

wherein, g _j To represent

And bag B ⁱ Corresponding relationship of

Matching score of g _l Represents a packet | B ⁱ I the first sentence

And bag B ⁱ Corresponding relationship of

The matching score of (a); g _j Calculated by equation (11):

wherein the content of the first and second substances,

is a bag B ⁱ Corresponding relation of (2)

The embedded vector obtained by the Embedding technique.

4. The method for extracting remote supervised relationship based on multi-layer attention mechanism as recited in claim 1, wherein the step S6 includes the steps of:

step 6.1: for all sentence packets B in the data set ¹ ，B ² ，...，B ^N N is the number of sentence packets in the dataset, and the set of relationships R ═ R ₁ ，r ₂ ，...，r _|R| Will have the same relation r to form a packet group

Wherein n represents G _r The number of middle packets;

step 6.2: calculate packet B by equation (12) ⁱ Degree of confidence c of _i ：

Wherein the content of the first and second substances,

is a bag

is a bag

And relation set R ═ R ₁ ，r ₂ ，...，r _|R| A matching score for each relationship in the }; for each relation R in R _k ，

Calculated by equation (13):

step 6.3: the packet pairs are combined according to their confidence levels.

5. A method for extracting remote supervised relationship based on multi-tier attentional mechanism as recited in claim 4, wherein the step 6.3 comprises the steps of:

step 6.3.1: grouping packets in order of increasing confidence of packets

Inner packet reordering to obtain

Step 6.3.2: obtaining a packet pair in a head-to-tail combination mode

Wherein the content of the first and second substances,

presentation package group G _r To (1) a

The number of the packets is one,

representing a rounding down operation of the division.

6. The method for extracting remote supervised relationship based on multi-layer attention mechanism as recited in claim 1, wherein the step S7 includes the steps of:

for each packet pair

Computing its vector representation by equation (14)

is a bag

The weight of (c) is calculated by the following equations (15) and (16):

wherein r represents a packet group G _r The corresponding relationship of (1).

7. The method for extracting remote supervised relationship based on multi-layer attention mechanism as recited in claim 1, wherein the step S8 includes the steps of:

step 8.1: calculating each relation in the relation set R