CN115545005A - Remote supervision relation extraction method fusing knowledge and constraint graph - Google Patents

Remote supervision relation extraction method fusing knowledge and constraint graph Download PDF

Info

Publication number
CN115545005A
CN115545005A CN202211185558.2A CN202211185558A CN115545005A CN 115545005 A CN115545005 A CN 115545005A CN 202211185558 A CN202211185558 A CN 202211185558A CN 115545005 A CN115545005 A CN 115545005A
Authority
CN
China
Prior art keywords
entity
vector
sentence
graph
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211185558.2A
Other languages
Chinese (zh)
Inventor
刘琼昕
牛文涛
王佳升
王甜甜
方胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202211185558.2A priority Critical patent/CN115545005A/en
Publication of CN115545005A publication Critical patent/CN115545005A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a remote supervision relation extraction method fusing knowledge and a constraint graph, and belongs to the technical field of text data relation extraction in computer natural language processing. The method performs additional information supplementation by using entity knowledge context. Information transfer between relations is carried out through entity types and relation constraint graphs, information fusion is carried out on sentence semantic information, entity context information and entity relation constraint information through a multi-source fusion attention mechanism, expression learning of sentences and entity relations is facilitated, and relation extraction effects are improved. The method simultaneously solves the problems of data noise and relationship long tail in remote supervision relationship extraction, is particularly suitable for relationship extraction under large-scale text data and complex text environments, and is very effective for extracting structured fact information from unstructured texts.

Description

Remote supervision relation extraction method fusing knowledge and constraint graph
Technical Field
The invention relates to a remote supervision relation extraction method fusing knowledge and a constraint graph, and belongs to the technical field of text data relation extraction in computer natural language processing.
Background
Natural Language Processing (NLP) is an important direction in the field of computers and artificial intelligence, and mainly realizes various theories and methods for effective communication between people and computers by using Natural Language.
With the development of artificial intelligence technology, in computer natural language processing, relation extraction based on machine learning and deep learning becomes a hotspot in the field. At present, deep semantic features of language texts are automatically mined by using large-scale text corpus training relation extraction, intelligent semantic analysis and relation extraction are realized, the limitation of a relation extraction method by using traditional manual design rules is broken through, and the method becomes a main means of a relation extraction technology. The relation extraction under large-scale text data and complex text environment is automatically carried out through deep learning, the extraction of structured fact information from unstructured text is realized, and the method has wide application prospect in natural language processing field tasks such as question and answer systems, knowledge maps, search engines and the like.
In recent years, relationship extraction based on remote supervision is a popular direction of relationship extraction technology based on deep learning. According to the method, a large knowledge base and a corpus are aligned, and a large amount of labeled text data are automatically obtained to train a relation extraction model. Specifically, let' head entity, tail entity, relationship > triple exist in the knowledge base, and sentences containing the head entity and the tail entity in the remote supervision hypothesis corpus all express the triple relationship. Compared with other relation extraction methods, the relation extraction method based on remote supervision uses a large amount of automatically labeled data for training, the problem that relation extraction training data are difficult to obtain can be solved, and the generalization capability of a trained relation model is strong.
However, remotely supervising automatically annotated data sets presents two serious problems: the first is the noise problem, namely, most sentences in the labeled data set are labeled wrongly, and the training effect of the relation extraction model is influenced. And the second problem is the long tail relation problem, namely, a few relations occupy most data, so that the relation extraction model cannot fully learn the long tail relation. However, the existing extraction method based on remote supervision relationship is difficult to solve the two problems of the remote supervision data set, or only can concentrate on solving one of the problems, so that effective training of the model cannot be realized, the actual effect of relationship extraction is poor, and mining of deep semantic features of the text is seriously influenced.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and creatively provides a remote supervision relation extraction method fusing knowledge and a constraint graph in order to effectively solve the problems of data noise and long tail in remote supervision relation extraction.
The method has the innovation points that: the method has the advantages that extra information supplement is carried out by using the entity knowledge context, information transfer between relations is carried out by using the entity type and the relation constraint graph, a multi-source fusion attention mechanism is adopted to carry out information fusion on the sentence semantic information, the entity context information and the entity relation constraint information, expression learning of sentences and entity relations is facilitated, and the relation extraction effect is improved. The method can effectively identify the relationship between the entity pairs for the unstructured text marked with the entity information, and is particularly effective for extracting the relationship under large-scale text data and complex text environment and extracting the structured fact information from the unstructured text.
The invention is realized by adopting the following technical scheme.
A remote supervision relation extraction method fusing knowledge and constraint graphs comprises the following steps:
step 1: and collecting the neighbor entities of the entities in the remote supervision data set in the knowledge base, wherein the neighbor entities comprise one-hop neighbor entities and two-hop neighbor entities. An entity set is formed by entities in a remote supervision data set and neighbor entities thereof, an entity neighbor graph is constructed by using the entity set, and a constraint graph is constructed by combining a relationship set between the entities. For remote surveillance data sets, sentences with identical entity pairs are combined into sentence packets.
Wherein, for entities in the remote supervision dataset, a Flair named entity recognition tool can be used to recognize that its entity type constitutes the entity type set.
And 2, step: a word embedding vector for each word in a sentence in the package is obtained, as well as a feature vector representation of the sentence.
Specifically, for the sentence packet obtained in step 1, the word embedding vector of each word in the sentence is obtained for the sentence in the packet through a word2vec tool, and the feature vector representation of the sentence is obtained by taking a segmented convolutional neural network (PCNN) as a sentence encoder.
And step 3: and collecting entity attribute information of each entity in the entity set in a knowledge base by using an attribute encoder, wherein the entity attribute information comprises an entity name, an entity alias, an entity type and an entity description. And each entity obtains the attribute vector of the corresponding entity by splicing the attribute information and inputting the attribute information into an attribute encoder, then outputting a matrix and adopting column vector averaging.
In particular, BERT may be used as the attribute encoder.
And 4, step 4: and constructing an adjacency matrix by using the entity neighbor graph, and obtaining knowledge context vector representation of the target entity by using the adjacency matrix and the entity attribute vector as input and a neighbor graph encoder constructed by a graph convolution neural network.
And 5: and constructing an adjacency matrix by using the constraint graph, and obtaining vector representations of the entity types and the relationships by using the adjacency matrix, the entity types and the vector representations of the relationships as input through a constraint graph encoder constructed by a graph convolution neural network.
Step 6: and taking the sentence characteristic vector representation, the entity context vector representation and the vector representation of the entity type and the relation as input, and calculating to obtain the characteristic vector representation of the sentence packet through a multi-source fusion attention mechanism.
And 7: and expressing the feature vector of the sentence packet, and predicting the relation label of the sentence packet through a relation classifier.
Further, in step 1, the knowledge base includes a relationship between an entity pair and an entity pair, and each entity has attribute information such as its entity name, entity alias, entity type, and entity description. The remote supervision data set is a training corpus marked by a remote supervision method, and the natural language text is marked by utilizing the entity pairs in the knowledge base and the corresponding relation. Specifically, assuming that a "< head entity, tail entity, relationship >" triple exists in the knowledge base, any sentence containing the head entity and the tail entity is considered to express the triple relationship, thereby obtaining the annotation data.
Further, in step 1, the entity type set can use 18 entity types defined in ontotinotes 5.0 and one "Other" special entity type, for a total of 19 entity types, and use Flair named entity recognition tool to identify the types of entities in the dataset.
Further, in step 2, the segmented convolutional neural network (PCNN) is a neural network model that takes the feature vector sequence of the sentence word as input, and generates a sentence feature vector representation by convolution and segmentation pooling based on the positions of two entities in the sentence.
Further, in step 3, BERT is a deep neural network constructed based on a multi-head self-attention mechanism and pre-trained using a large number of corpora, and can output a feature vector representation with semantic information for an input text.
Further, in step 4 and step 5, the graph convolution neural network is a neural network that performs information propagation and aggregation on graph data through convolution operation, thereby extracting feature information.
Further, in step 6, the multi-source fusion attention mechanism is a technical scheme of information fusion based on the attention mechanism, which is firstly proposed by the present invention, and the semantic information of sentences, the entity knowledge context and the constraint information of entity relations are fused to obtain the feature vector representation of sentence packets.
Further, in step 7, the relationship classifier is implemented as: and (3) representing the feature vector of the sentence packet to the feature representation of each relation in the input and relation set, calculating a prediction score through dot product operation, and then calculating the probability of classifying the sentence packet into a certain relation through softmax operation.
Advantageous effects
Compared with the prior art, the invention has the following advantages:
first, the method of the present invention, by integrating the information of the entity in the knowledge base into the relationship extraction model as the knowledge context supplement of the entity, helps the relationship extraction model to judge whether the remote supervision annotation is correct, and effectively reduces the data noise in the remote supervision relationship extraction.
Secondly, the method of the invention realizes indirect connection of different relationships through entity types by using the constraint graphs of the entity types and the relationships among the entities, helps information to be transmitted among the relationships, and solves the problem of long relationship tail of remote supervision relationship extraction.
Thirdly, the method of the invention uses a multi-source fusion attention mode to fuse the entity knowledge context and the constraint graph information with the sentence information in the packet, and the obtained packet feature vector expression can better embody the relation of the target entity pair, and simultaneously solves the data noise problem and the relation long tail problem of remote supervision relation extraction.
Drawings
FIG. 1 is a general framework schematic of the process of the present invention.
Detailed Description
The method of the present invention will be described in further detail with reference to the accompanying drawings.
A remote supervision relation extraction method fusing knowledge and constraint graphs is shown in figure 1 and comprises the following steps:
step 1: and collecting one-hop or two-hop neighbor entities of the entities in the remote supervision data set in the knowledge base, wherein the entities in the remote supervision data set and the neighbor entities thereof form an entity set, and the entity set is used for constructing an entity neighbor graph. And identifying the entity types of the entities by using a Flair named entity identification tool to form an entity type set according to the entities in the remote supervision data set, and constructing a constraint graph by combining the relationship set among the entities. For remote supervisory data sets, sentences having the same entity pairs are combined into sentence packets.
Specifically, an entity neighbor graph is defined as a graph K = { E, N }, where E represents a set of entity nodes, that is, a set of entities; n represents a set of edges. If two entities E in the set E 1 、e 2 While appearing in a triple in the knowledge base, there is an edge (e) 1 ,e 2 )∈N。
The constraint graph is defined as a graph G = { T, R, C }, where T is a set of entity type nodes, 18 coarse entity types defined in ontotonotes 5.0 may be used, and for entity types not belonging to these 18 types, a special node "other" is used to represent, so that the set T includes 19 entity type nodes. The Flair named entity recognition tool is used to identify the type of entity in the dataset.
Let R be the set of relationship nodes formed by all relationships and C be the set of constraint edges, if entity e 1 、e 2 Of the entity type
Figure BDA0003867522810000051
And entity e 1 、e 2 Having a relation r, then there is a constraint
Figure BDA0003867522810000052
Each constraint
Figure BDA0003867522810000053
Correspond to
Figure BDA0003867522810000054
And
Figure BDA0003867522810000055
two sides.
And 2, step: for the sentence packet obtained in the step 1, the word embedding vector of each word in the sentence can be obtained by a word2vec tool in the sentence packet, and the feature vector representation of the sentence can be obtained by taking a segmented convolutional neural network (PCNN) as a sentence encoder.
In particular, for in-package sentences
Figure BDA0003867522810000056
n s For the length of the sentence s, each word w i The input of e s consists of its own word embedding vector and location feature vector, where the word embedding vector v i Can be obtained by word2vec tool pre-training, and the vector dimension is d w (ii) a Location feature vector
Figure BDA0003867522810000057
Is the word w i With the target entity in the sentence (e) h ,e t ) Is embedded in a vector of two relative distances, the vector dimension being d p . And the relative distance takes the position of the entity pair appearing for the first time in the sentence as a reference position to calculate the relative distance of other words.
Word w is obtained by concatenation i Is input to represent w i
Figure BDA0003867522810000058
d=d w +2d p
Figure BDA0003867522810000059
A vector is represented. Input representation w i As shown in formula 1:
Figure BDA00038675228100000510
wherein, "; "denotes a vector stitching operation. The input of the sentence s is represented as a matrix
Figure BDA00038675228100000511
And (3) encoding an input representation matrix X of the sentence s by using a segmented convolutional neural network (PCNN) to obtain a sentence feature vector with a fixed dimension.
Wherein the segmented convolutional neural network packetIncluding convolutional layers and segmented max-pooling layers. Wherein the parameter matrix W of the convolutional layer is represented as:
Figure BDA00038675228100000512
w represents the length of the convolution sliding window, sub-matrix q of matrix X under the mth sliding window m As shown in equation 2:
q m =X m-w+1:m (1≤m≤l s +w-1) (2)
wherein l s The length of a sentence s is represented, and m-w +1 represents an index interval of the word sequence in all the word sequences of the original sentence under the sliding window.
Then the sub-matrix q m Parameter matrix c with convolution kernel m As shown in equation 3:
Figure BDA0003867522810000061
wherein,
Figure BDA0003867522810000062
representing a convolution operation.
Specifically, in the convolution process, sliding convolution is carried out with the stride of 1, zero vector filling is carried out on the part of a convolution window exceeding the sentence boundary, finally, a characteristic vector c representing a matrix X is obtained,
Figure BDA0003867522810000063
in practical use, in order to capture different characteristics of a sentence, a plurality of convolution checks are usually used to perform convolution on a sentence expression matrix, and d is adopted in the invention c A set of convolution kernels represented as
Figure BDA0003867522810000064
After convolution calculation, the expression matrix X corresponds to d c A feature vector
Figure BDA0003867522810000065
The segmented convolutional neural network segments the maximum pooling layer, takes the position of an entity pair in a sentence s as a segmentation point, can segment the feature vector into 3 parts, and then applies maximum pooling operation to each part respectively. Specifically, for any one feature vector
Figure BDA0003867522810000066
After segmentation, 3 eigenvectors are generated: { c i,1 ;c i,2 ;c i,3 }. Performing maximum pooling on each feature sub-vector to obtain a pooled feature vector f with a dimensionality of 3 i As shown in formula 4:
f i =[max(c i,1 );max(c i,2 );max(c i,3 )] (4)
where max (·) represents the max operation.
Matrix X corresponds to d c A feature vector
Figure BDA0003867522810000067
Respectively after the segmented maximum pooling layers, splicing the obtained pooled feature vectors, and obtaining the feature vector representation of X after an activation function tanh (-) is carried out
Figure BDA0003867522810000068
Figure BDA0003867522810000069
As shown in equation 5:
Figure BDA00038675228100000615
wherein,
Figure BDA00038675228100000610
denotes d c Pooling feature vectors. Thus obtained
Figure BDA00038675228100000611
I.e. a feature vector representation of the sentence s.
And step 3: and (2) collecting entity name, entity alias, entity type and entity description four entity attribute information in a knowledge base for each entity in the entity set in the step 1 by using BERT as an attribute encoder. And each entity inputs the attribute information into BERT by splicing, then outputs a matrix, and obtains the attribute vector of the corresponding entity by adopting column vector averaging.
Specifically, for each entity e in the entity set i E, the obtained corresponding attribute vector
Figure BDA00038675228100000612
d a Representing the dimensions of the entity attribute vector. Entity e i Attribute vector of
Figure BDA00038675228100000613
The calculation method is as follows:
Figure BDA00038675228100000614
wherein Mean (-) represents the column averaging operation; BERT (-) represents an output matrix calculated using a BERT pre-training model; name i Representing an entity e i The entity name of (1); alias i Representing an entity e i The entity of (2) is named differently; type i Representing an entity e i Entity types in the knowledge base; description of i Representing an entity e i The entity description of (1); with symbols [ SEP ] between attribute information]Spacing; [ CLS]A start identifier to be added for BERT input.
And 4, step 4: and (3) constructing an adjacent matrix by using the entity neighbor graph in the step (1), and obtaining the knowledge context vector representation of the target entity by using the adjacent matrix and the entity attribute vector obtained in the step (3) as input through a neighbor graph encoder constructed by a graph convolution neural network.
The target entity may be any entity in the entity set in step 1. For the entity neighbor graph K = { E, N }, its set of entity nodes is E, set of edges is N, its adjacency matrix is
Figure BDA0003867522810000071
Is calculated as shown in equation 7:
Figure BDA0003867522810000072
wherein | E | represents the number of physical nodes; v. of i 、v j E represents any two entity nodes in the entity node set; (v) of i ,v j ) E N indicates that there is a v in the entity neighbor graph i 、v j Edges between physical nodes.
Selecting a two-layer graph convolution neural network as a neighbor graph encoder, and carrying out the entity attribute vector obtained in the step 3
Figure BDA0003867522810000073
As entity e in entity neighbor graph i E initial vector of E, for node v i Output of E at kth graph convolution layer
Figure BDA0003867522810000074
The calculation method is shown in formula 8:
Figure BDA0003867522810000075
wherein Relu (-) is a nonlinear activation function, | E | represents the number of physical nodes, W (k) Weight matrix representing the k-th layer of the neighbor graph encoder, b (k) The bias term for the k-th layer is represented,
Figure BDA0003867522810000076
a feature vector representing the k-1 level output of the jth physical node,
Figure BDA0003867522810000077
selecting the output of the last layer of the neighbor graph encoder as an entity e i E knowledge contextRepresenting a vector
Figure BDA0003867522810000078
d n Representing the output vector dimension of the last layer of the neighbor graph encoder.
For an entity node set E of an entity neighbor graph K, a knowledge context matrix is obtained after passing through a neighbor graph encoder
Figure BDA0003867522810000079
Each row vector in the matrix K corresponds to knowledge context vector representation of an entity, and the knowledge context vector representation fuses entity neighbors and attribute information thereof.
And 5: and (2) constructing an adjacency matrix by using the constraint graph in the step (1), and using the adjacency matrix, the entity type and the vector representation of the relationship as input to a constraint graph encoder constructed by a graph convolution neural network to obtain the vector representation of the entity type and the relationship.
Specifically, for a constraint graph G = { T, R, C }, the nodes in the graph include entity type nodes and relationship nodes, and the set of nodes V of the constraint graph G = T ≡ R. Define the edge set of the constraint graph G as D G For each constraint
Figure BDA0003867522810000081
D G Middle correspond to
Figure BDA0003867522810000082
And
Figure BDA0003867522810000083
two sides. Adjacency matrix of constraint graph G
Figure BDA0003867522810000084
Figure BDA0003867522810000085
The calculation formula of (a) is shown in formula 9:
Figure BDA0003867522810000086
wherein, | V G I represents the number of nodes of the constraint graph; v. of i 、v j ∈V G Any two nodes in the constraint graph node set are selected; (v) i ,v j )∈D G Indicating the presence of a v in the constraint graph i 、v j Edges between nodes.
Selecting a two-layer graph convolutional neural network as a constraint graph encoder, and collecting nodes V of the constraint graph G Performing a random initialization operation on the initial input vector of each node in the set, for the node v i ∈V G The output of the k-th graph convolution layer
Figure BDA0003867522810000087
The calculation method is shown in formula 10:
Figure BDA0003867522810000088
wherein Relu (. Cndot.) is a nonlinear activation function, | V G I represents the number of nodes of the constraint graph, M (k) Weight matrix, q, representing the k layer of the constraint graph encoder (k) The bias term for the k-th layer is represented,
Figure BDA0003867522810000089
representing the feature vector of the k-1 level output of the jth node.
Selecting the output of the last layer of the constraint graph encoder as a node v i ∈V G Is represented by the feature vector of (1). Set of nodes V for constraint graph G G Obtaining a feature matrix after constraint graph encoder
Figure BDA00038675228100000810
d g Representing the output vector dimensions of the last layer of the constraint graph coder. Feature matrix V G Each row vector in (b) corresponds to a node set V of the constraint graph G G Vector representation of one node.
Set of nodes V G Including entity type nodes and relationship nodes. By aligning the feature matrix V G Splitting is carried out to obtain a characteristic matrix of the entity type node set T
Figure BDA00038675228100000811
And feature matrix of the set of relational nodes R
Figure BDA00038675228100000812
|n t I represents the number of entity types in the entity type set T, and n r And | represents the number of relationships in the set of relationship nodes R.
And 6: and (4) expressing the sentence characteristic vector obtained in the step (2), expressing the entity context vector obtained in the step (4) and expressing the entity type and relationship vector obtained in the step (5) as input, and calculating to obtain the characteristic vector expression of the sentence packet through a multi-source fusion attention mechanism.
In particular, for sentence packets
Figure BDA00038675228100000813
n b The sentence number of the sentence packet B corresponds to the target entity pair of (e) h ,e t ) And the corresponding relation label is R belongs to R.
For each sentence s in sentence packet B i E.g. B, and obtaining a characteristic vector s of the sentence by a sentence coder i (ii) a For the target entity pair (e) corresponding to the sentence packet h ,e t ) The entity e is obtained by an entity attribute encoder and a neighbor graph encoder h And e t Corresponding knowledge context vector representation
Figure BDA0003867522810000091
Target entity pair for sentence packet (e) h ,e t ) And a relation label r, obtaining a corresponding entity type vector through a constraint graph encoder
Figure BDA0003867522810000092
And a relationship vector R ∈ R'.
Multi-source fusion attention machineAnd adding the knowledge context vector and the entity type vector of the entity into the feature vectors of the sentences and the relations, thereby providing more knowledge background information and constraint information for relation extraction. In particular, the sentence s i Final vector representation of e B
Figure BDA0003867522810000093
By concatenating sentence feature vectors s i Entity type vector of target entity pair
Figure BDA0003867522810000094
And knowledge context vector of target entity pair
Figure BDA0003867522810000095
Obtained as shown in formula 11:
Figure BDA0003867522810000096
wherein,
Figure BDA0003867522810000097
d s =3d c +2d g +2d n vector s i Has a dimension of 3d c Vector of motion
Figure BDA0003867522810000098
Has a dimension of d g Vector of motion
Figure BDA0003867522810000099
Has a dimension of d n (ii) a "; "denotes a vector stitching operation.
The final vector representation R of the relation R ∈ R f The calculation method is as follows:
Figure BDA00038675228100000910
where Linear (·) denotes a Linear fully-connected layer, the purpose of which is to align the vector r f Sum vector
Figure BDA00038675228100000911
The vector dimension of (2).
"; "denotes a vector stitching operation.
In obtaining a sentence s i Final vector representation of e B
Figure BDA00038675228100000912
And a final vector representation r of the relationship label r f Thereafter, the feature vector representation of sentence packet B is obtained through the attention mechanism
Figure BDA00038675228100000913
The calculation is shown in equation 13:
Figure BDA00038675228100000914
wherein n is b Number of sentences, alpha, for sentence packet B i Representing a sentence s i Final vector representation of e B
Figure BDA00038675228100000915
Is calculated as shown in equation 14:
Figure BDA00038675228100000916
wherein e is i Representing sentence vectors
Figure BDA00038675228100000917
The matching score with the relation label r of the sentence packet is calculated as shown in the formula 15:
Figure BDA00038675228100000918
wherein "·" denotes a vector dot product operation.
And 7: and (4) expressing the feature vector of the sentence packet obtained in the step (6), and predicting the relation label of the sentence packet through a relation classifier.
After the feature vector B of the packet B is obtained through a multi-source fusion attention mechanism, the relational classifier predicts the relational label of the sentence packet B. Sentence packet B with each relationship R in the set of relationships R i The calculation formula of the prediction score of (2) is shown in equation 17:
Figure BDA0003867522810000101
wherein,
Figure BDA0003867522810000102
represents the relation r i Is represented by the final vector of (a),
Figure BDA0003867522810000103
representing a bias term.
When each relation R in the sentence sub-packet B and the relation set R is obtained i After the prediction score of (2), the packet B is classified into the relation r by calculating the softmax function i Is calculated as shown in equation 17:
Figure BDA0003867522810000104
wherein, | n r L represents the number of relationships in the set of relationships R,
Figure BDA0003867522810000105
indicating the jth relation R between the sentence packet B and the relation set R j Theta is a parameter of the relational extraction model.
Further, the relationship extraction model in the invention is trained by using a cross entropy loss function as a loss function, and a calculation formula is shown in formula 18:
Figure BDA0003867522810000106
wherein n is B Indicating the number of sentence bundles in the training set used,
Figure BDA0003867522810000107
sentence display packet B i θ is a parameter of the relationship extraction model used in the present invention, and includes all the parameters in the above steps: parameters of the sentence encoder in step 2, parameters of the attribute encoder in step 3, parameters of the entity neighbor graph encoder in step 4, parameters of the constraint graph encoder in step 5, parameters of the multi-source fusion attention mechanism in step 6, and parameters of the relationship classifier in step 7. The parameters include a weight matrix and bias terms.
In the invention, the relation extraction model used by the relation extraction method can use a small batch of random gradient descent optimization algorithm SGD minimum loss function J (theta), thereby better optimizing parameters.

Claims (5)

1. A remote supervision relation extraction method fusing knowledge and constraint graphs is characterized by comprising the following steps:
step 1: collecting neighbor entities of entities in the remote supervision data set in a knowledge base, wherein the neighbor entities comprise one-hop neighbor entities and two-hop neighbor entities; an entity set is formed by entities in a remote supervision data set and neighbor entities thereof, an entity neighbor graph is constructed by using the entity set, and a constraint graph is constructed by combining a relationship set between the entities; for the remote supervision data set, combining sentences with the same entity pairs into a sentence sub-packet;
step 2: acquiring a word embedding vector of each word in a sentence in the packet and a feature vector representation of the sentence;
and 3, step 3: collecting entity attribute information of each entity in the entity set in a knowledge base by using an attribute encoder, wherein the entity attribute information comprises an entity name, an entity type and an entity description; each entity obtains the attribute vector of the corresponding entity by splicing the attribute information and inputting the attribute information into an attribute encoder, then outputting a matrix and carrying out column vector equalization;
and 4, step 4: constructing an adjacent matrix by using an entity neighbor graph, and obtaining knowledge context vector representation of a target entity by using the adjacent matrix and an entity attribute vector as input and a neighbor graph encoder constructed by a graph convolution neural network;
and 5: constructing an adjacency matrix by using a constraint graph, and obtaining vector representations of entity types and relationships by using the adjacency matrix, entity types and vector representations of relationships as input through a constraint graph encoder constructed by a graph convolution neural network;
step 6: taking sentence characteristic vector representation, entity context vector representation, entity type and relation vector representation as input, and calculating to obtain characteristic vector representation of a sentence packet through a multi-source fusion attention mechanism;
and 7: and expressing the feature vector of the sentence packet, and predicting the relation label of the sentence packet through a relation classifier.
2. The method for extracting remote supervision relationship based on fusion knowledge and constraint graph as claimed in claim 1, wherein in step 1, the knowledge base contains the relationship between entity pair and entity pair, and the attribute information of each entity: entity name, entity alias, entity type, and entity description;
the remote supervision data set is a training corpus marked by a remote supervision method, the entity pairs and the corresponding relations in the knowledge base are used for marking natural language texts, a triple of < head entity, tail entity and relation >' exists in the knowledge base, and any sentence containing the head entity and the tail entity is determined to express the triple relation, so that marked data are obtained.
3. The method as claimed in claim 1, wherein in step 2, the word embedding vector of each word in the sentence is obtained from the sentence in the package through a word2vec tool, and the feature vector representation of the sentence is obtained by using a segmented convolutional neural network as a sentence encoder, wherein the segmented convolutional neural network is a neural network model which takes the feature vector sequence of the words in the sentence as input and generates the sentence feature vector representation by convolution and segmentation pooling based on the positions of two entities in the sentence.
4. A method for extracting remote supervised relationship fusing knowledge and constraint graphs as recited in claim 1, wherein in step 1, an entity neighbor graph is defined as a graph K = { E, N }, where E represents a set of entity nodes, that is, a set of entities; n represents a set of edges; if two entities E in the set E 1 、e 2 While appearing in a triple in the knowledge base, there is an edge (e) 1 ,e 2 )∈N;
Defining a constraint graph as a graph G = { T, R, C }, wherein T is an entity type node set, and identifying the type of an entity in a data set by using a Flair named entity identification tool;
let R be the set of relationship nodes formed by all relationships and C be the set of constraint edges, if entity e 1 、e 2 The entity type of
Figure FDA0003867522800000021
And entity e 1 、e 2 Having the relationship r, then there is a constraint
Figure FDA0003867522800000022
Each constraint
Figure FDA0003867522800000023
Correspond to
Figure FDA0003867522800000024
And
Figure FDA0003867522800000025
two edges;
in step 2, for the sentence packet obtained in step 1, for the sentences in the packet
Figure FDA0003867522800000026
n s For the length of sentence s, each word w i The input of e s consists of its own word embedding vector and location feature vector, where the word embedding vector v i Let the vector dimension be d w (ii) a Location feature vector
Figure FDA0003867522800000027
Is the word w i With target entity pair in sentence (e) h ,e t ) Is embedded in a vector representation of two relative distances, the vector dimension being d p (ii) a The relative distance takes the position of the entity pair appearing for the first time in the sentence as a reference position to calculate the relative distance of other words;
word w is obtained by concatenation i Is input to represent w i
Figure FDA0003867522800000028
d=d w +2d p
Figure FDA0003867522800000029
Representing a vector; input representation w i As shown in formula 1:
Figure FDA00038675228000000210
wherein, "; "represents a vector stitching operation; the input of the sentence s is represented as a matrix
Figure FDA00038675228000000211
Using a segmented convolutional neural network to encode an input representation matrix X of a sentence s to obtain a sentence characteristic vector with a fixed dimension; the segmented convolutional neural network comprises a convolutional layer and a segmented maximum pooling layer; wherein the parameter matrix W of the convolutional layer is represented as:
Figure FDA00038675228000000212
w denotes the length of the convolution sliding window, sub-matrix q of matrix X under the mth sliding window m As shown in equation 2:
q m =X m-w+1:m (1≤m≤l s +w-1) (2)
wherein l s The length of a sentence s is represented, and m-w +1 represents an index interval of all word sequences of the original sentence in the word sequence under the sliding window;
then the submatrix q m Parameter matrix c with convolution kernel m Is shown in formula 3:
Figure FDA0003867522800000031
wherein,
Figure FDA0003867522800000032
representing a convolution operation;
in the convolution process, sliding convolution is carried out with the stride of 1, the part of a convolution window exceeding the sentence boundary is filled with zero vectors, finally a characteristic vector c representing a matrix X is obtained,
Figure FDA0003867522800000033
by d c A set of convolution kernels represented as
Figure FDA0003867522800000034
After convolution calculation, the expression matrix X corresponds to d c A feature vector
Figure FDA0003867522800000035
Segmenting a maximal pooling layer of a segmented convolutional neural network, taking the position of an entity pair in a sentence s as a segmentation point, segmenting the feature vector into 3 parts, and then respectively applying maximal pooling operation to each part; for any one feature vector
Figure FDA0003867522800000036
After segmentation, 3 eigenvectors are generated: { c i,1 ;c i,2 ;c i,3 }; performing maximum pooling on each feature sub-vector to obtain a pooled feature vector f with a dimensionality of 3 i As shown in formula 4:
f i =[max(c i,1 );max(c i,2 );max(c i,3 )] (4)
where max (·) denotes a max operation;
matrix X corresponds to d c A feature vector
Figure FDA0003867522800000037
Respectively after the segmented maximum pooling layers, splicing the obtained pooled feature vectors, and obtaining the feature vector representation of X after an activation function tanh (-) is carried out
Figure FDA0003867522800000038
Figure FDA0003867522800000039
As shown in equation 5:
Figure FDA00038675228000000310
wherein,
Figure FDA00038675228000000311
denotes d c Pooling feature vectors; thus obtained
Figure FDA00038675228000000312
Namely representing the feature vector of the sentence s;
in step 3, using BERT as an attribute encoder, for each entity in the entity set in step 1, collecting four entity attribute information of entity name, entity alias, entity type and entity description in a knowledge base; each entity inputs the attribute information into BERT by splicing, then outputs a matrix, and obtains the attribute vector of the corresponding entity by adopting column vector averaging;
for each entity e in the entity set i E, the obtained corresponding attribute vector
Figure FDA00038675228000000313
d a A dimension representing an entity attribute vector; entity e i Attribute vector of
Figure FDA00038675228000000314
The calculation method is as follows:
Figure FDA00038675228000000315
Figure FDA00038675228000000316
Figure FDA00038675228000000317
wherein Mean (-) represents the column averaging operation; BERT (-) represents an output matrix calculated by using a BERT pre-training model; name i Representing an entity e i The entity name of (1); alias (Alias) i Representing an entity e i The entity of (1) is named separately; type i Representing an entity e i An entity type in the knowledge base; description of i Representing an entity e i The entity description of (1); by symbol [ SEP ] between attribute information]Spacing; [ CLS]The initial identifier is added when the BERT is input;
in step 4, constructing an adjacent matrix by using the entity neighbor graph in the step 1, using the adjacent matrix and the entity attribute vector obtained in the step 3 as input, and obtaining knowledge context vector representation of a target entity by using a neighbor graph encoder constructed by a graph convolution neural network;
wherein, the target entity is any entity in the entity set in the step 1; for the entity neighbor graph K = { E, N }, its set of entity nodes is E, and the set of edges isN, its adjacent matrix
Figure FDA0003867522800000041
Is calculated as shown in equation 7:
Figure FDA0003867522800000042
wherein | E | represents the number of physical nodes; v. of i 、v j E represents any two entity nodes in the entity node set; (v) i ,v j ) E N indicates that there is a v in the entity neighbor graph i 、v j Edges between the physical nodes;
selecting a two-layer graph convolutional neural network as a neighbor graph encoder, and using the entity attribute vector obtained in the step 3
Figure FDA0003867522800000043
As entity e in an entity neighbor graph i E initial vector of E, for node v i E is the output of the k-th graph convolution layer
Figure FDA0003867522800000044
The calculation method is shown in formula 8:
Figure FDA0003867522800000045
wherein Relu (-) is a nonlinear activation function, | E | represents the number of entity nodes, W (k) Weight matrix representing the k-th layer of the neighbor graph encoder, b (k) A bias term representing the k-th layer,
Figure FDA0003867522800000046
a feature vector representing the k-1 layer output of the j-th physical node,
Figure FDA0003867522800000047
selecting the output of the last layer of the neighbor graph encoder as an entity e i Knowledge context representation vector of E
Figure FDA0003867522800000048
Figure FDA0003867522800000049
d n Representing the output vector dimension of the last layer of the neighbor graph encoder;
for an entity node set E of an entity neighbor graph K, a knowledge context matrix is obtained through a neighbor graph encoder
Figure FDA00038675228000000410
Each row vector in the matrix K corresponds to knowledge context vector representation of an entity, and the knowledge context vector representation integrates entity neighbors and attribute information thereof;
in step 5, constructing an adjacency matrix by using the constraint graph in the step 1, and obtaining vector representations of entity types and relationships by using the adjacency matrix, the vector representations of entity types and relationships as input through a constraint graph encoder constructed by a graph convolution neural network;
for a constraint graph G = { T, R, C }, nodes in the graph comprise entity type nodes and relationship nodes, and a node set V of the constraint graph G = T ═ u ≡ R; define the edge set of the constraint graph G as D G For each constraint
Figure FDA0003867522800000051
D G Middle correspondence
Figure FDA0003867522800000052
And
Figure FDA0003867522800000053
two edges; adjacency matrix of constraint graph G
Figure FDA0003867522800000054
Figure FDA0003867522800000055
The calculation formula of (a) is shown in formula 9:
Figure FDA0003867522800000056
wherein, | V G I represents the number of nodes of the constraint graph; v. of i 、v j ∈V G Any two nodes in the constraint graph node set are selected; (v) i ,v j )∈D G Indicating the existence of a v in the constraint graph i 、v j Edges between nodes;
selecting a two-layer graph convolutional neural network as a constraint graph encoder, and collecting nodes V of the constraint graph G Performing a random initialization operation on the initial input vector of each node in the set, for the node v i ∈V G The output of the k-th graph convolution layer
Figure FDA0003867522800000057
The calculation method is shown in formula 10:
Figure FDA0003867522800000058
wherein Relu (. Cndot.) is a nonlinear activation function, | V G I represents the number of nodes of the constraint graph, M (k) Weight matrix, q, representing the k layer of the constraint graph encoder (k) The bias term for the k-th layer is represented,
Figure FDA0003867522800000059
a feature vector representing the k-1 layer output of the jth node;
selecting the output of the last layer of the constraint graph encoder as a node v i ∈V G A feature vector representation of; set of nodes V for constraint graph G G Obtaining a value after passing through the constraint map encoderIndividual feature matrix
Figure FDA00038675228000000510
d g Representing the output vector dimension of the last layer of the constraint graph encoder; feature matrix V G Each row vector in (b) corresponds to a node set V of the constraint graph G G A vector representation of the one node;
set of nodes V G The method comprises entity type nodes and relationship nodes; by aligning feature matrix V G Splitting to obtain a characteristic matrix of the entity type node set T
Figure FDA00038675228000000511
And feature matrix of the set of relational nodes R
Figure FDA00038675228000000512
|n t I represents the number of entity types in the entity type set T, and n r L represents the number of relationships in the relationship node set R;
in step 6, the sentence feature vector obtained in step 2, the entity context vector obtained in step 4, and the vector representation of the entity type and the relationship obtained in step 5 are used as input, and feature vector representation of the sentence sub-packet is obtained through calculation by a multi-source fusion attention mechanism;
for sentence packets
Figure FDA00038675228000000513
n b The sentence number of the sentence packet B corresponds to the target entity pair of (e) h ,e t ) The corresponding relation label is R epsilon R;
for each sentence s in sentence packet B i E.g. B, and obtaining a characteristic vector s of the sentence by a sentence coder i (ii) a For the target entity pair (e) corresponding to the sentence packet h ,e t ) The entity e is obtained by an entity attribute encoder and a neighbor graph encoder h And e t Corresponding knowledge context vector representation
Figure FDA00038675228000000514
Figure FDA00038675228000000515
Target entity pair for sentence packet (e) h ,e t ) And a relation label r, obtaining a corresponding entity type vector through a constraint graph encoder
Figure FDA00038675228000000516
Figure FDA00038675228000000517
And the relation vector R belongs to R';
the multi-source fusion attention mechanism is characterized in that a knowledge context vector and an entity type vector of an entity are added into a sentence and a feature vector of a relation, so that more knowledge background information and constraint information are provided for relation extraction; sentence s i Final vector representation of e B
Figure FDA0003867522800000061
By concatenating sentence feature vectors s i Entity type vector of target entity pair
Figure FDA0003867522800000062
And knowledge context vector of target entity pair
Figure FDA0003867522800000063
Obtained as shown in formula 11:
Figure FDA0003867522800000064
wherein,
Figure FDA0003867522800000065
d s =3d c +2d g +2d n vector s i Dimension of 3d c Vector of motion
Figure FDA0003867522800000066
Has a dimension of d g Vector of motion
Figure FDA0003867522800000067
Has a dimension of d n (ii) a "; "represents a vector splicing operation;
the final vector representation R of the relation R ∈ R f The calculation method is as follows:
Figure FDA0003867522800000068
where Linear (·) denotes a Linear fully-connected layer, the purpose of which is to align the vector r f Sum vector
Figure FDA0003867522800000069
The vector dimension of (a); "; "represents a vector stitching operation;
in obtaining a sentence s i Final vector representation of e B
Figure FDA00038675228000000610
And the final vector representation r of the relationship label r f Thereafter, the feature vector representation of sentence packet B is obtained through the attention mechanism
Figure FDA00038675228000000611
The calculation is shown in equation 13:
Figure FDA00038675228000000612
wherein n is b Number of sentences, alpha, for sentence packet B i Representing a sentence s i Final vector representation of e B
Figure FDA00038675228000000613
Is calculated as shown in equation 14:
Figure FDA00038675228000000614
wherein e is i Representing sentence vectors
Figure FDA00038675228000000615
The matching score with the relation label r of the sentence packet is calculated as shown in the formula 15:
Figure FDA00038675228000000616
wherein, "·" represents a vector dot product operation;
in step 7, the relation labels of the sentence packets are predicted through a relation classifier according to the feature vector representation of the sentence packets obtained in step 6;
after a feature vector B of a packet B is obtained through a multi-source fusion attention mechanism, a relation classifier predicts a relation label of the sentence packet B; sentence packet B with each relationship R in the set of relationships R i The calculation formula of the prediction score of (2) is shown in equation 17:
Figure FDA0003867522800000071
wherein,
Figure FDA0003867522800000072
represents the relation r i Is represented by the final vector of (a),
Figure FDA0003867522800000073
representing a bias term;
when each relation R in the sentence sub-packet B and the relation set R is obtained i After predicting the score, the packet is calculated by the softmax functionB is classified as a relation r i The probability P of (c) is calculated as shown in equation 17:
Figure FDA0003867522800000074
wherein, | n r L represents the number of relationships in the set of relationships R,
Figure FDA0003867522800000075
indicating the jth relation R between the sentence packet B and the relation set R j Theta is a parameter of the relational extraction model.
5. The method for extracting remote supervision relationship based on the fusion knowledge and constraint graph as claimed in claim 4, wherein the relationship extraction model is trained by using the cross entropy loss function as the loss function, and the calculation formula is shown in equation 18:
Figure FDA0003867522800000076
wherein n is B Indicating the number of sentence packets in the training set used,
Figure FDA0003867522800000077
sentence display packet B i θ is a parameter of the relationship extraction model used in the present invention, and includes all parameters in the above steps: parameters of the sentence encoder in step 2, parameters of the attribute encoder in step 3, parameters of the entity neighbor graph encoder in step 4, parameters of the constraint graph encoder in step 5, parameters of the multi-source fusion attention mechanism in step 6, and parameters of the relationship classifier in step 7; the parameters include a weight matrix and bias terms.
CN202211185558.2A 2022-09-27 2022-09-27 Remote supervision relation extraction method fusing knowledge and constraint graph Pending CN115545005A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211185558.2A CN115545005A (en) 2022-09-27 2022-09-27 Remote supervision relation extraction method fusing knowledge and constraint graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211185558.2A CN115545005A (en) 2022-09-27 2022-09-27 Remote supervision relation extraction method fusing knowledge and constraint graph

Publications (1)

Publication Number Publication Date
CN115545005A true CN115545005A (en) 2022-12-30

Family

ID=84730059

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211185558.2A Pending CN115545005A (en) 2022-09-27 2022-09-27 Remote supervision relation extraction method fusing knowledge and constraint graph

Country Status (1)

Country Link
CN (1) CN115545005A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737965A (en) * 2023-08-11 2023-09-12 深圳市腾讯计算机系统有限公司 Information acquisition method and device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737965A (en) * 2023-08-11 2023-09-12 深圳市腾讯计算机系统有限公司 Information acquisition method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108959252B (en) Semi-supervised Chinese named entity recognition method based on deep learning
CN110969020B (en) CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN111753024B (en) Multi-source heterogeneous data entity alignment method oriented to public safety field
CN114021584B (en) Knowledge representation learning method based on graph convolution network and translation model
CN113128229A (en) Chinese entity relation joint extraction method
CN111881677A (en) Address matching algorithm based on deep learning model
CN109165275B (en) Intelligent substation operation ticket information intelligent search matching method based on deep learning
CN111274804A (en) Case information extraction method based on named entity recognition
CN107679110A (en) The method and device of knowledge mapping is improved with reference to text classification and picture attribute extraction
CN112733866A (en) Network construction method for improving text description correctness of controllable image
CN113407660B (en) Unstructured text event extraction method
CN109960728A (en) A kind of open field conferencing information name entity recognition method and system
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN110555084A (en) remote supervision relation classification method based on PCNN and multi-layer attention
CN111259197B (en) Video description generation method based on pre-coding semantic features
CN112069825B (en) Entity relation joint extraction method for alert condition record data
CN111460097A (en) Small sample text classification method based on TPN
CN112989833A (en) Remote supervision entity relationship joint extraction method and system based on multilayer LSTM
CN114048314B (en) Natural language steganalysis method
CN115545005A (en) Remote supervision relation extraction method fusing knowledge and constraint graph
CN114841151A (en) Medical text entity relation joint extraction method based on decomposition-recombination strategy
CN114925205A (en) GCN-GRU text classification method based on comparative learning
CN112905793B (en) Case recommendation method and system based on bilstm+attention text classification
CN114528368A (en) Spatial relationship extraction method based on pre-training language model and text feature fusion
CN115186670B (en) Method and system for identifying domain named entities based on active learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination