CN113076391B - Remote supervision relation extraction method based on multi-layer attention mechanism - Google Patents

Remote supervision relation extraction method based on multi-layer attention mechanism Download PDF

Info

Publication number
CN113076391B
CN113076391B CN202110453297.7A CN202110453297A CN113076391B CN 113076391 B CN113076391 B CN 113076391B CN 202110453297 A CN202110453297 A CN 202110453297A CN 113076391 B CN113076391 B CN 113076391B
Authority
CN
China
Prior art keywords
sentence
vector
representing
sentences
packets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110453297.7A
Other languages
Chinese (zh)
Other versions
CN113076391A (en
Inventor
刘琼昕
王佳升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Publication of CN113076391A publication Critical patent/CN113076391A/en
Application granted granted Critical
Publication of CN113076391B publication Critical patent/CN113076391B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a remote supervision relation extraction method based on a multilayer attention mechanism, and belongs to the technical field of artificial intelligence and natural language processing. The method aims to solve the technical problems that an existing remote supervision relation extraction method is poor in extraction effect in a noise environment, noise processing of sentence sub-packets is omitted, entities in sentences repeatedly appear, and the like, and aims at the situation that the entity pairs in the sentences repeatedly appear, a relative position characteristic attention is designed, and position information of the entity pairs is fully utilized. Aiming at the problem of sentence packet noise, the confidence coefficient of each sentence packet is calculated, and the sentence packets with the same relation and high noise and the sentence packets with low noise are combined into packet groups according to the confidence coefficient, so that the noise among the packet groups is relatively balanced, and the effect of relation extraction in a noise environment is improved.

Description

Remote supervision relation extraction method based on multi-layer attention mechanism
Technical Field
The invention relates to a remote supervision relation extraction method, in particular to a remote supervision relation extraction method based on a multilayer attention mechanism, and belongs to the technical field of artificial intelligence and natural language processing.
Background
Information Extraction (Information Extraction), which is a basic task in natural language processing, extracts structured Information by processing unstructured text as input for a subsequent natural language processing task. In the era of knowledge explosion, people need to face mass data every day, and it becomes very important to efficiently process texts and extract useful information by reasonably utilizing an information extraction system.
Information extraction, which is a very important ring in natural language processing, is itself composed of a series of subtasks, such as named entity recognition, relationship extraction, and event extraction. Among them, the Relation Extraction (RE) is a key technology of information Extraction, which aims to mine semantic relations existing between entities and has a very important meaning for the fields of automatic knowledge base construction, question-answering systems, and the like.
Remote Supervision relationship Extraction (DSRE) is a mainstream relationship Extraction method that labels corpora by using an external knowledge base instead of manpower, can obtain a large amount of labeled data at low cost, and is currently used.
In the knowledge base, there are a large number of triples in the form of "entity 1, entity 2, entity 1, 2 relation", and remote supervision aligns an unlabeled corpus to a known knowledge base, where each triplet in the knowledge base corresponds to a set of sentences, called a Bag (Bag). The sentences in the package are all obtained from the unmarked corpus, and the sentences in the same package all contain the same entity pair (i.e. two entities in the corresponding triple). Remote supervised relationship extraction hypothesis package each sentence has a relationship corresponding to an entity pair, which results in a large amount of labeled data while introducing a large amount of noise, i.e., incorrectly labeled data. In addition, in order to better capture the influence of the words in the sentence on the entity pairs, relative position features are applied to the relationship extraction model, and the relative position features refer to the relative distance between each word in the sentence and the entity pairs.
However, the existing remote supervision relation extraction method ignores the noise of sentence packets, and meanwhile, the technical problems that entity pairs repeatedly appear in sentences exist.
Disclosure of Invention
The invention aims to solve the technical problems that the existing remote supervision relation extraction method has poor extraction effect in a noise environment, ignores noise processing on sentence sub-packets, and repeatedly appears entities in sentences, and creatively provides a remote supervision relation extraction method based on a multi-layer attention mechanism.
The method has the innovativeness that: aiming at the repeated occurrence of the entity pairs in the sentences, a relative position feature attention is designed, and the position information of the entity pairs is fully utilized. Aiming at the problem of sentence packet noise, the confidence coefficient of each sentence packet is calculated, and the sentence packets with the same relation and high noise and the sentence packets with low noise are combined into packet groups according to the confidence coefficient, so that the noise among the packet groups is relatively balanced.
The technical scheme adopted by the method is as follows:
a remote supervision relation extraction method based on a multi-layer attention mechanism comprises the following steps:
s1: and acquiring a knowledge base, and dividing sentences in the knowledge data set according to packets.
S2: a matrix representation of the sentence is obtained.
S3: through a sentence encoder, a feature vector of a sentence is obtained.
S4: by the attention of the position feature, the weight vector representation of the sentence is obtained.
S5: by sentence attention, a vector representation of the package is obtained.
S6: the packets of the data set are combined into packet pairs.
S7: a vector representation of each packet pair is obtained.
S8: the loss value of the packet pair is obtained.
S9: the model parameters are updated by back propagation and gradient descent.
S10: and predicting the relation of the packets aiming at the unlabeled packets according to the trained model to obtain new triple knowledge, thereby mining the semantic information of the sentences in the packets.
Advantageous effects
Compared with the prior art, the method of the invention has the following advantages:
1. the invention considers the repeated appearance of the entity in the sentence, calculates the position characteristic of the sentence aiming at the position of each group of entity pairs, designs the attention of the position characteristic and fully utilizes the position information of the entity in the sentence.
2. The invention carries out noise reduction on the level of the sentence packets, combines the packet groups with balanced noise according to the confidence of the sentence packets, and improves the effect of relation extraction in a noise environment.
Drawings
FIG. 1 is an overall block diagram of the method;
FIG. 2 is a block diagram of a PCNN;
FIG. 3 is a graph comparing Precision/Recall curves for the method of the present invention with existing methods.
Detailed Description
The method of the present invention is further described in detail below with reference to the drawings and examples. The embodiment elaborates the method and the effect of the method when the method is specifically implemented under the condition of relational extraction of the mainstream data set.
As shown in fig. 1, a remote supervised relationship extracting method based on a multi-layer attention mechanism includes the following steps:
step 1: and acquiring a knowledge base, and dividing sentences in the knowledge data set according to packets.
Specifically, sentences in the knowledge data set are divided into packets according to corresponding entity pairs, so that the sentences in the packets have the same entity pair, and the corresponding relationship of the entity pair is assigned to each sentence.
Step 2: a matrix representation of the sentence is obtained.
The method comprises the following specific steps:
first, let the ith packet in the data set
Figure BDA0003039651350000031
Wherein the content of the first and second substances,
Figure BDA0003039651350000032
presentation bag B i The last sentence of (1); order bag B i The jth sentence in China
Figure BDA0003039651350000033
Wherein l ij Representing sentences
Figure BDA0003039651350000034
The number of words of (a); order sentence
Figure BDA0003039651350000035
The head entity is positioned as
Figure BDA0003039651350000036
Wherein the content of the first and second substances,
Figure BDA0003039651350000037
the number of occurrences of the head entity; order sentence
Figure BDA0003039651350000038
The position of the tail entity is
Figure BDA0003039651350000039
Wherein the content of the first and second substances,
Figure BDA00030396513500000310
the number of occurrences of the tail entity.
Then, the head entity position and the tail entity position are combined to obtain
Figure BDA00030396513500000311
Wherein
Figure BDA00030396513500000312
For sentences
Figure BDA00030396513500000313
Each word in
Figure BDA00030396513500000314
All the position characteristics are calculated by using the Embedding technology:
Figure BDA00030396513500000315
wherein the content of the first and second substances,
Figure BDA00030396513500000316
representing words
Figure BDA00030396513500000317
Relative to the head entity
Figure BDA00030396513500000318
A feature vector of a position is generated,
Figure BDA00030396513500000319
representing words
Figure BDA00030396513500000320
Relative to the tail entity
Figure BDA00030396513500000321
A position feature vector is generated by the position feature vector,
Figure BDA00030396513500000322
Figure BDA00030396513500000323
representing the real number intra-domain dimension as d p The vector space of (c).
Calculating the words by formula (1)
Figure BDA00030396513500000324
Is/are as follows
Figure BDA00030396513500000325
Set of medium final vector representations
Figure BDA00030396513500000326
Wherein:
Figure BDA00030396513500000327
wherein the content of the first and second substances,
Figure BDA0003039651350000041
representation collection
Figure BDA0003039651350000042
The first vector of (a) is,
Figure BDA0003039651350000043
Figure BDA0003039651350000044
is a word
Figure BDA0003039651350000045
The embedded word vector of (a) is,
Figure BDA0003039651350000046
Figure BDA0003039651350000047
representing the real number intra-domain dimension as d w Of (c) is determined. d represents the final vector representation dimension, d ═ d w +2d p Wherein d is w Dimension representing embedded word vector, d p Representing the dimensions of the location feature vector.
Calculating a sentence by equation (2)
Figure BDA0003039651350000048
Set of matrix representations
Figure BDA0003039651350000049
Wherein:
Figure BDA00030396513500000410
wherein the content of the first and second substances,
Figure BDA00030396513500000411
representation collection
Figure BDA00030396513500000412
The first matrix of (a) is,
Figure BDA00030396513500000413
representing sentences
Figure BDA00030396513500000414
Figure BDA00030396513500000414
1 ≦ k ≦ l ij ,
Figure BDA00030396513500000415
l ij As sentences
Figure BDA00030396513500000416
The number of words in (c).
And 3, step 3: through the sentence encoder, a feature vector of a sentence is obtained.
Specifically, as shown in fig. 2, the method comprises the following steps:
for sentences
Figure BDA00030396513500000417
Each matrix representation of
Figure BDA00030396513500000418
Using a kernel containing m convolution kernels { f 1 ,f 2 ,...,f m The segmented Convolutional Neural Network (PCNN) of X, which is obtained by the vector representation of X. Wherein f is m Representing the m-th convolution kernel, each convolution kernel
Figure BDA00030396513500000419
Figure BDA00030396513500000419
1≤i≤m,
Figure BDA00030396513500000424
Representing the matrix space of size l x k in the real domain, l representing the length of the convolution kernel and k representing the width of the convolution kernel.
Figure BDA00030396513500000420
Representing a set of matrices
Figure BDA00030396513500000421
The first matrix of (a) is,
Figure BDA00030396513500000422
first, convolution features c are extracted by m convolution kernels ij
c ij =f i *w j-l+1:j (3)
Wherein i is more than or equal to 1 and less than or equal to m, j is more than or equal to 1 and less than or equal to | X | + l-1, w j-l+1:j Representing a matrix formed by corresponding vectors from the j-l +1 th row to the j-th row of the matrix X, | X | represents the row number of the matrix X; is a convolution operation. After convolution, a matrix is obtained
Figure BDA00030396513500000423
Then, for each convolution kernel f i Corresponding convolution result c i Divided into three parts by two physical positions in the sentence { c i1 ,c i2 ,c i3 And the parts from the head entity to the head entity, the parts from the head entity to the tail entity and the parts from the tail entity to the tail of the sentence are respectively, and the maximum pooling operation of the segments is carried out:
p ij =max(c ij ) (4)
wherein i is more than or equal to 1 and less than or equal to m, and j is more than or equal to 1 and less than or equal to 3.
Each convolution kernel f i Corresponding to a 3-dimensional vector p i ={p i1 ,p i2 ,p i3 }. All convolution cores are spliced with response vectors to obtain vectors
Figure BDA0003039651350000051
By means of the tanh function, the final vector representation of the matrix X is obtained:
s=tanh(p 1:m ) (5)
wherein the content of the first and second substances,
Figure BDA0003039651350000052
d s =3m,d s representing the dimensions of the final vector representation of matrix X. At this point, the vector representation to which sentence S corresponds is independent of its length.
For sentences
Figure BDA0003039651350000053
Is/are as follows
Figure BDA0003039651350000054
A matrix representation
Figure BDA0003039651350000055
Through the formulas (3), (4) and (5), the method is obtained
Figure BDA0003039651350000056
A corresponding feature vector
Figure BDA0003039651350000057
And 4, step 4: by the attention of the position feature, the weight vector representation of the sentence is obtained.
In particular, the amount of the solvent to be used,
for sentences
Figure BDA0003039651350000058
Obtaining the weighted feature vector thereof by the formula (6)
Figure BDA0003039651350000059
Figure BDA00030396513500000510
Wherein alpha is k Is composed of
Figure BDA00030396513500000511
Is calculated by equation (7):
Figure BDA00030396513500000512
wherein e is k Representing a vector
Figure BDA00030396513500000513
And sentences
Figure BDA00030396513500000514
Corresponding relationship of
Figure BDA00030396513500000525
Matching score of e l Representing sentences
Figure BDA00030396513500000515
Feature vector set of
Figure BDA00030396513500000516
The first vector of
Figure BDA00030396513500000517
And sentences
Figure BDA00030396513500000518
Corresponding relationship of
Figure BDA00030396513500000519
The matching score of (2). e.g. of the type k Calculated by equation (8):
Figure BDA00030396513500000520
wherein the content of the first and second substances,
Figure BDA00030396513500000521
as sentences
Figure BDA00030396513500000522
Corresponding relationship of
Figure BDA00030396513500000523
The embedded vector obtained by the Embedding technique.
And 5: by sentence attention, a vector representation of the packet is obtained.
In particular, the amount of the solvent to be used,
for bag
Figure BDA00030396513500000524
B is calculated by the formula (9) i Is represented by the vector of (A) i
Figure BDA0003039651350000061
Wherein, beta j Is that
Figure BDA0003039651350000062
Weight of (c), B i I represents packet B i The number of sentences in the sentence. Beta is a j Calculated by equation (10):
Figure BDA0003039651350000063
wherein, g j To represent
Figure BDA0003039651350000064
And bag B i Corresponding relationship of
Figure BDA0003039651350000065
Matching score of g l Represents a packet | B i I the first sentence
Figure BDA0003039651350000066
And bag B i Corresponding relationship of
Figure BDA0003039651350000067
The matching score of (2). g j Calculated by equation (11).
Figure BDA0003039651350000068
Wherein the content of the first and second substances,
Figure BDA00030396513500000625
is a bag B i Corresponding relationship of
Figure BDA0003039651350000069
The embedded vector obtained by the Embedding technique.
Step 6: the packets of the data set are combined into packet pairs.
Specifically, the method comprises the following steps:
step 6.1: for all sentence packets in the data set B 1 ,B 2 ,...,B N N is the number of sentence packets in the dataset, and the set of relationships R ═ R 1 ,r 2 ,...,r |R| The packets with the same relation r are combined into a packet group
Figure BDA00030396513500000610
Wherein n represents G r The number of middle packets.
Step 6.2: by equation (12), packet group G is calculated r The ith packet in (1)
Figure BDA00030396513500000611
Degree of confidence c i
Figure BDA00030396513500000612
Wherein the content of the first and second substances,
Figure BDA00030396513500000613
is a bag
Figure BDA00030396513500000614
And sentence bag group G r Is determined by the matching score of the corresponding relation r,
Figure BDA00030396513500000615
is a bag
Figure BDA00030396513500000616
And relation set R ═ R 1 ,r 2 ,...,r |R| Each inThe matching score of the individual relationships. For each relation R in R k
Figure BDA00030396513500000617
Calculated by equation (13):
Figure BDA00030396513500000618
step 6.3: the packet pairs are combined according to their confidence levels.
Specifically, the method comprises the following steps:
step 6.3.1: grouping packets in order of increasing confidence of packets
Figure BDA00030396513500000619
Inner packet is reordered to obtain
Figure BDA00030396513500000620
Step 6.3.2: according to the head-to-tail combination mode, obtaining packet pairs
Figure BDA00030396513500000621
Wherein the content of the first and second substances,
Figure BDA00030396513500000622
presentation package group G r To (1) a
Figure BDA00030396513500000623
The number of the packets is one,
Figure BDA00030396513500000624
representing a rounding down operation of the division.
And 7: a vector representation of each packet pair is obtained.
In particular, the amount of the solvent to be used,
for each packet pair
Figure BDA0003039651350000071
Computing its vector representation by equation (14)
Figure BDA0003039651350000072
Figure BDA0003039651350000073
Wherein the content of the first and second substances,
Figure BDA0003039651350000074
is a bag
Figure BDA0003039651350000076
The weight of (b) is calculated by the following formulas (15) and (16):
Figure BDA0003039651350000077
Figure BDA0003039651350000078
wherein r represents a packet group G r The corresponding relationship of (1).
And 8: the loss value of the packet pair is obtained.
Specifically, the method comprises the following steps:
step 8.1: computing relationship set R in package pairs
Figure BDA0003039651350000079
Predicted score vector o:
Figure BDA00030396513500000710
wherein the content of the first and second substances,
Figure BDA00030396513500000711
is a weight matrix, | R | is the total number of relationships, k is the embedded vector dimension of each relationship in the set of relationships R,
Figure BDA00030396513500000712
the offset is represented by the number of bits in the bit,
Figure BDA00030396513500000713
bag pair
Figure BDA00030396513500000714
Corresponding relation r i Is predicted to score
Figure BDA00030396513500000715
Step 8.2: r is calculated using Softmax i Probability of (P):
Figure BDA00030396513500000716
where θ is a training parameter in the relationship extractor, o j Representing the jth relation in the relation set in the package pair
Figure BDA00030396513500000717
The predicted score of (a) is calculated,
Figure BDA00030396513500000718
represents the relation r i In bag pair
Figure BDA00030396513500000719
N is the number of relationships in the relationship set R.
Step 8.3: obtaining a packet pair by using a cross entropy loss function
Figure BDA00030396513500000720
Corresponding loss value L:
Figure BDA00030396513500000721
wherein P represents r i The probability of (c).
And step 9: model parameters are updated by back propagation and gradient descent.
Step 10: and predicting the relation of the packets aiming at the unlabeled packets according to the trained model to obtain new triple knowledge, thereby mining the semantic information of the sentences in the packets.
Examples experimental verification
The experimental comparison results of the method of the invention on the NYT data set with various reference methods are shown in tables 1 and 2, and the method of the invention achieves greater effect improvement compared with the baseline method on the P @ N index and the AUC value. In addition, as can be seen from fig. 3, the relationship extraction effect of the method of the present invention is superior to that of the existing relationship extraction method with better effect.
TABLE 1 Baseline method and P @ N index for the method of the invention
Figure BDA0003039651350000081
TABLE 2 AUC index for baseline and inventive methods
Figure BDA0003039651350000082

Claims (7)

1. A remote supervision relation extraction method based on a multi-layer attention mechanism is characterized by comprising the following steps:
s1: acquiring a knowledge base, and dividing sentences in a knowledge data set according to packets;
s2: obtaining a matrix representation of the sentence:
first, let the ith packet in the data set
Figure FDA0003039651340000011
Wherein the content of the first and second substances,
Figure FDA0003039651340000012
presentation bag B i The last sentence of (1); order bag B i The jth sentence in China
Figure FDA0003039651340000013
Wherein l ij Representing sentences
Figure FDA0003039651340000014
The number of words of (a); order sentence
Figure FDA0003039651340000015
The head entity is positioned as
Figure FDA0003039651340000016
Wherein the content of the first and second substances,
Figure FDA0003039651340000017
the number of occurrences of the head entity; order sentence
Figure FDA0003039651340000018
The position of the tail entity is
Figure FDA0003039651340000019
Wherein the content of the first and second substances,
Figure FDA00030396513400000110
the number of occurrences of the tail entity;
then, the head entity position and the tail entity position are combined to obtain
Figure FDA00030396513400000111
Wherein
Figure FDA00030396513400000112
For sentences
Figure FDA00030396513400000113
Each word in
Figure FDA00030396513400000114
All the position characteristics are calculated by using the Embedding technology:
Figure FDA00030396513400000115
wherein the content of the first and second substances,
Figure FDA00030396513400000116
representing words
Figure FDA00030396513400000117
Relative to the head entity
Figure FDA00030396513400000118
A position feature vector is generated by the position feature vector,
Figure FDA00030396513400000119
representing words
Figure FDA00030396513400000120
Relative to the tail entity
Figure FDA00030396513400000121
A position feature vector is generated by the position feature vector,
Figure FDA00030396513400000122
Figure FDA00030396513400000123
representing the real number intra-domain dimension as d p The vector space of (a);
calculating the words by formula (1)
Figure FDA00030396513400000124
Is/are as follows
Figure FDA00030396513400000125
Set of medium final vector representations
Figure FDA00030396513400000126
Wherein:
Figure FDA00030396513400000127
wherein the content of the first and second substances,
Figure FDA00030396513400000128
representation collection
Figure FDA00030396513400000129
The first vector of (a) is,
Figure FDA00030396513400000130
Figure FDA00030396513400000131
is a word
Figure FDA00030396513400000132
The embedded word vector of (a) is,
Figure FDA00030396513400000133
Figure FDA00030396513400000134
representing the real number intra-domain dimension as d w D denotes the final vector representation dimension, d ═ d w +2d p Wherein d is w Dimension representing embedded word vector, d p A dimension representing a location feature vector;
calculating a sentence by equation (2)
Figure FDA00030396513400000135
Set of matrix representations
Figure FDA00030396513400000136
Wherein:
Figure FDA00030396513400000137
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA00030396513400000138
representation collection
Figure FDA00030396513400000139
The first matrix of (a) is,
Figure FDA00030396513400000140
representing sentences
Figure FDA00030396513400000141
1 ≦ k ≦ l ij ,
Figure FDA00030396513400000142
l ij As sentences
Figure FDA00030396513400000143
The number of words of (c);
s3: acquiring a feature vector of a sentence through a sentence encoder;
for sentences
Figure FDA0003039651340000021
Each matrix representation of
Figure FDA0003039651340000022
Using a kernel containing m convolution kernels { f 1 ,f 2 ,...,f m The segmented convolutional neural network of (f) gets the vector representation of X, where f m Representing the m-th convolution kernel, each convolution kernel
Figure FDA0003039651340000023
1≤i≤m,
Figure FDA0003039651340000024
Representing the matrix space of size l x k in the real domain, l representing the length of the convolution kernel, k representing the width of the convolution kernel,
Figure FDA0003039651340000025
representing a set of matrices
Figure FDA0003039651340000026
The first matrix of (a) is,
Figure FDA0003039651340000027
first, convolution feature c is extracted by m convolution kernels ij
c ij =f i *w j-l+1:j (3)
Wherein i is more than or equal to 1 and less than or equal to m, j is more than or equal to 1 and less than or equal to | X | + l-1, w j-l+1:j Representing a matrix formed by corresponding vectors from the j-l +1 th row to the j-th row of the matrix X, | X | represents the row number of the matrix X; is a convolution operation; after convolution, a matrix is obtained
Figure FDA0003039651340000028
Then, for each convolution kernel f i Corresponding convolution result c i Divided into three parts { c ] according to two entity positions in the sentence i1 ,c i2 ,c i3 And the parts from the head entity to the head entity, the parts from the head entity to the tail entity and the parts from the tail entity to the tail of the sentence are respectively, and the maximum pooling operation of the segments is carried out:
p ij =max(c ij ) (4)
wherein i is more than or equal to 1 and less than or equal to m, and j is more than or equal to 1 and less than or equal to 3;
each convolution kernel f i Corresponding to a 3-dimensional vector p i ={p i1 ,p i2 ,p i3 }; all convolutionsObtaining the vector by matching the corresponding vectors
Figure FDA0003039651340000029
By means of the tanh function, the final vector representation of the matrix X is obtained:
s=tanh(p 1:m ) (5)
wherein the content of the first and second substances,
Figure FDA00030396513400000210
d s =3m,d s representing the dimension of the final vector representation of the matrix X, wherein the vector representation corresponding to the sentence S is irrelevant to the length of the sentence S;
for sentences
Figure FDA00030396513400000211
Is/are as follows
Figure FDA00030396513400000212
A matrix representation
Figure FDA00030396513400000213
Through the formulas (3), (4) and (5), the method is obtained
Figure FDA00030396513400000214
A corresponding feature vector
Figure FDA00030396513400000215
S4: by the attention of the position feature, the weight vector representation of the sentence is obtained:
for sentences
Figure FDA00030396513400000216
Obtaining the weighted feature vector thereof by the formula (6)
Figure FDA00030396513400000217
Figure FDA0003039651340000031
Wherein alpha is k Is composed of
Figure FDA0003039651340000032
Is calculated by equation (7):
Figure FDA0003039651340000033
wherein e is k Representing a vector
Figure FDA0003039651340000034
And sentences
Figure FDA0003039651340000035
Corresponding relationship of
Figure FDA0003039651340000036
Matching score of e l Representing sentences
Figure FDA0003039651340000037
Feature vector set of
Figure FDA0003039651340000038
The first vector of
Figure FDA0003039651340000039
And sentences
Figure FDA00030396513400000310
Corresponding relationship of
Figure FDA00030396513400000311
The matching score of (2); e.g. of the type k Calculated by equation (8):
Figure FDA00030396513400000312
wherein the content of the first and second substances,
Figure FDA00030396513400000313
as sentences
Figure FDA00030396513400000314
Corresponding relationship of
Figure FDA00030396513400000315
Obtaining an embedded vector through an Embedding technology;
s5: obtaining a vector representation of the package through sentence attention;
s6: combining the packets of the data set into packet pairs;
s7: obtaining a vector representation for each packet pair;
s8: obtaining a loss value of the packet pair;
s9: updating model parameters through back propagation and gradient descent;
s10: and predicting the relation of the packets aiming at the unlabeled packets according to the trained model to obtain new triple knowledge, thereby mining the semantic information of the sentences in the packets.
2. The method of claim 1, wherein the step S1 is implemented by dividing sentences in the knowledge data set into packets according to corresponding entity pairs, making the sentences in the packets have the same entity pair, and assigning a relationship corresponding to the entity pair to each sentence.
3. The method for extracting remote supervised relationship based on multi-layer attention mechanism as recited in claim 1, wherein the step S5 includes the following steps:
for bag
Figure FDA00030396513400000316
B is calculated by the formula (9) i Is represented by the vector of (A) i
Figure FDA00030396513400000317
Wherein beta is j Is that
Figure FDA00030396513400000318
Weight of (c), B i I represents packet B i The number of sentences of (1); beta is a j Calculated by equation (10):
Figure FDA00030396513400000319
wherein, g j To represent
Figure FDA0003039651340000041
And bag B i Corresponding relationship of
Figure FDA0003039651340000042
Matching score of g l Represents a packet | B i I the first sentence
Figure FDA0003039651340000043
And bag B i Corresponding relationship of
Figure FDA0003039651340000044
The matching score of (a); g j Calculated by equation (11):
Figure FDA0003039651340000045
wherein the content of the first and second substances,
Figure FDA0003039651340000046
is a bag B i Corresponding relation of (2)
Figure FDA0003039651340000047
The embedded vector obtained by the Embedding technique.
4. The method for extracting remote supervised relationship based on multi-layer attention mechanism as recited in claim 1, wherein the step S6 includes the steps of:
step 6.1: for all sentence packets B in the data set 1 ,B 2 ,...,B N N is the number of sentence packets in the dataset, and the set of relationships R ═ R 1 ,r 2 ,...,r |R| Will have the same relation r to form a packet group
Figure FDA0003039651340000048
Wherein n represents G r The number of middle packets;
step 6.2: calculate packet B by equation (12) i Degree of confidence c of i
Figure FDA0003039651340000049
Wherein the content of the first and second substances,
Figure FDA00030396513400000410
is a bag
Figure FDA00030396513400000411
And sentence bag group G r Is determined by the matching score of the corresponding relation r,
Figure FDA00030396513400000412
is a bag
Figure FDA00030396513400000413
And relation set R ═ R 1 ,r 2 ,...,r |R| A matching score for each relationship in the }; for each relation R in R k
Figure FDA00030396513400000414
Calculated by equation (13):
Figure FDA00030396513400000415
step 6.3: the packet pairs are combined according to their confidence levels.
5. A method for extracting remote supervised relationship based on multi-tier attentional mechanism as recited in claim 4, wherein the step 6.3 comprises the steps of:
step 6.3.1: grouping packets in order of increasing confidence of packets
Figure FDA00030396513400000416
Inner packet reordering to obtain
Figure FDA00030396513400000417
Step 6.3.2: obtaining a packet pair in a head-to-tail combination mode
Figure FDA00030396513400000418
Wherein the content of the first and second substances,
Figure FDA00030396513400000419
presentation package group G r To (1) a
Figure FDA00030396513400000420
The number of the packets is one,
Figure FDA00030396513400000421
representing a rounding down operation of the division.
6. The method for extracting remote supervised relationship based on multi-layer attention mechanism as recited in claim 1, wherein the step S7 includes the steps of:
for each packet pair
Figure FDA00030396513400000422
Computing its vector representation by equation (14)
Figure FDA00030396513400000423
Figure FDA0003039651340000051
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003039651340000052
is a bag
Figure FDA0003039651340000053
The weight of (c) is calculated by the following equations (15) and (16):
Figure FDA0003039651340000054
Figure FDA0003039651340000055
wherein r represents a packet group G r The corresponding relationship of (1).
7. The method for extracting remote supervised relationship based on multi-layer attention mechanism as recited in claim 1, wherein the step S8 includes the steps of:
step 8.1: calculating each relation in the relation set R
Figure FDA0003039651340000056
Predicted score of o:
Figure FDA0003039651340000057
wherein the content of the first and second substances,
Figure FDA0003039651340000058
is a weight matrix, | R | is the total number of relationships, k is the embedded vector dimension of each relationship in the set of relationships R,
Figure FDA0003039651340000059
the offset is represented by the number of bits in the bit,
Figure FDA00030396513400000510
bag pair
Figure FDA00030396513400000511
Corresponding relation label r i Is predicted to score
Figure FDA00030396513400000512
Step 8.2: r is calculated using Softmax i Probability of (P):
Figure FDA00030396513400000513
where θ is a training parameter in the relationship extractor, o j Representing the jth relation in the relation set in the package pair
Figure FDA00030396513400000514
The predicted score of (a) is calculated,
Figure FDA00030396513400000515
represents the relation r i In bag pair
Figure FDA00030396513400000516
N is the number of relationships in the relationship set R;
step 8.3: obtaining a packet pair by using a cross entropy loss function
Figure FDA00030396513400000517
Corresponding loss value L:
Figure FDA00030396513400000518
wherein P represents r i The probability of (c).
CN202110453297.7A 2021-01-27 2021-04-26 Remote supervision relation extraction method based on multi-layer attention mechanism Active CN113076391B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2021101120392 2021-01-27
CN202110112039 2021-01-27

Publications (2)

Publication Number Publication Date
CN113076391A CN113076391A (en) 2021-07-06
CN113076391B true CN113076391B (en) 2022-09-20

Family

ID=76618797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110453297.7A Active CN113076391B (en) 2021-01-27 2021-04-26 Remote supervision relation extraction method based on multi-layer attention mechanism

Country Status (1)

Country Link
CN (1) CN113076391B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761936B (en) * 2021-08-19 2023-04-07 哈尔滨工业大学(威海) Multi-task chapter-level event extraction method based on multi-head self-attention mechanism
CN114757179A (en) * 2022-04-13 2022-07-15 成都信息工程大学 Entity relationship joint extraction method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125434A (en) * 2019-11-26 2020-05-08 北京理工大学 Relation extraction method and system based on ensemble learning
CN111191461A (en) * 2019-06-06 2020-05-22 北京理工大学 Remote supervision relation extraction method based on course learning
CN111914558A (en) * 2020-07-31 2020-11-10 湖北工业大学 Course knowledge relation extraction method and system based on sentence bag attention remote supervision

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11625573B2 (en) * 2018-10-29 2023-04-11 International Business Machines Corporation Relation extraction from text using machine learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191461A (en) * 2019-06-06 2020-05-22 北京理工大学 Remote supervision relation extraction method based on course learning
CN111125434A (en) * 2019-11-26 2020-05-08 北京理工大学 Relation extraction method and system based on ensemble learning
CN111914558A (en) * 2020-07-31 2020-11-10 湖北工业大学 Course knowledge relation extraction method and system based on sentence bag attention remote supervision

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A soft-label method for noise-tolerant distantly supervised relation extraction;Liu Tianyu等;《Proceedings of the 2017 Conference on Empirical Methods in Natural Languange Processing》;20171231;1790-1795 *
基于多层注意力机制的农业病虫害远程监督关系抽取研究;乐毅等;《安徽农业大学学报》;20200909;第47卷(第04期);682-686 *

Also Published As

Publication number Publication date
CN113076391A (en) 2021-07-06

Similar Documents

Publication Publication Date Title
CN108595632B (en) Hybrid neural network text classification method fusing abstract and main body characteristics
CN110413986A (en) A kind of text cluster multi-document auto-abstracting method and system improving term vector model
CN111753024B (en) Multi-source heterogeneous data entity alignment method oriented to public safety field
CN109684642B (en) Abstract extraction method combining page parsing rule and NLP text vectorization
CN110222218B (en) Image retrieval method based on multi-scale NetVLAD and depth hash
CN111027595B (en) Double-stage semantic word vector generation method
CN110969020A (en) CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN113076391B (en) Remote supervision relation extraction method based on multi-layer attention mechanism
CN111680488B (en) Cross-language entity alignment method based on knowledge graph multi-view information
CN109800437A (en) A kind of name entity recognition method based on Fusion Features
CN106909537B (en) One-word polysemous analysis method based on topic model and vector space
CN113051399B (en) Small sample fine-grained entity classification method based on relational graph convolutional network
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN110287323A (en) A kind of object-oriented sensibility classification method
CN110765755A (en) Semantic similarity feature extraction method based on double selection gates
CN110276396B (en) Image description generation method based on object saliency and cross-modal fusion features
CN112860904B (en) External knowledge-integrated biomedical relation extraction method
CN110619121A (en) Entity relation extraction method based on improved depth residual error network and attention mechanism
CN111274804A (en) Case information extraction method based on named entity recognition
CN109918507B (en) textCNN (text-based network communication network) improved text classification method
CN106557777A (en) It is a kind of to be based on the improved Kmeans clustering methods of SimHash
CN112818113A (en) Automatic text summarization method based on heteromorphic graph network
CN111476024A (en) Text word segmentation method and device and model training method
CN111259157A (en) Chinese text classification method based on hybrid bidirectional circulation capsule network model
CN111191461B (en) Remote supervision relation extraction method based on course learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant