CN115545005A - Remote supervision relation extraction method fusing knowledge and constraint graph - Google Patents
Remote supervision relation extraction method fusing knowledge and constraint graph Download PDFInfo
- Publication number
- CN115545005A CN115545005A CN202211185558.2A CN202211185558A CN115545005A CN 115545005 A CN115545005 A CN 115545005A CN 202211185558 A CN202211185558 A CN 202211185558A CN 115545005 A CN115545005 A CN 115545005A
- Authority
- CN
- China
- Prior art keywords
- entity
- vector
- sentence
- graph
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 claims abstract description 23
- 230000004927 fusion Effects 0.000 claims abstract description 19
- 230000007246 mechanism Effects 0.000 claims abstract description 14
- 239000013598 vector Substances 0.000 claims description 196
- 239000011159 matrix material Substances 0.000 claims description 66
- 238000004364 calculation method Methods 0.000 claims description 19
- 238000013527 convolutional neural network Methods 0.000 claims description 14
- 238000011176 pooling Methods 0.000 claims description 14
- 238000013528 artificial neural network Methods 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000012935 Averaging Methods 0.000 claims description 5
- JXSJBGJIGXNWCI-UHFFFAOYSA-N diethyl 2-[(dimethoxyphosphorothioyl)thio]succinate Chemical compound CCOC(=O)CC(SP(=S)(OC)OC)C(=O)OCC JXSJBGJIGXNWCI-UHFFFAOYSA-N 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 3
- 238000003062 neural network model Methods 0.000 claims description 2
- 238000003058 natural language processing Methods 0.000 abstract description 6
- 230000000694 effects Effects 0.000 abstract description 5
- 238000012546 transfer Methods 0.000 abstract description 2
- 230000009469 supplementation Effects 0.000 abstract 1
- 238000013135 deep learning Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 241000764238 Isis Species 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a remote supervision relation extraction method fusing knowledge and a constraint graph, and belongs to the technical field of text data relation extraction in computer natural language processing. The method performs additional information supplementation by using entity knowledge context. Information transfer between relations is carried out through entity types and relation constraint graphs, information fusion is carried out on sentence semantic information, entity context information and entity relation constraint information through a multi-source fusion attention mechanism, expression learning of sentences and entity relations is facilitated, and relation extraction effects are improved. The method simultaneously solves the problems of data noise and relationship long tail in remote supervision relationship extraction, is particularly suitable for relationship extraction under large-scale text data and complex text environments, and is very effective for extracting structured fact information from unstructured texts.
Description
Technical Field
The invention relates to a remote supervision relation extraction method fusing knowledge and a constraint graph, and belongs to the technical field of text data relation extraction in computer natural language processing.
Background
Natural Language Processing (NLP) is an important direction in the field of computers and artificial intelligence, and mainly realizes various theories and methods for effective communication between people and computers by using Natural Language.
With the development of artificial intelligence technology, in computer natural language processing, relation extraction based on machine learning and deep learning becomes a hotspot in the field. At present, deep semantic features of language texts are automatically mined by using large-scale text corpus training relation extraction, intelligent semantic analysis and relation extraction are realized, the limitation of a relation extraction method by using traditional manual design rules is broken through, and the method becomes a main means of a relation extraction technology. The relation extraction under large-scale text data and complex text environment is automatically carried out through deep learning, the extraction of structured fact information from unstructured text is realized, and the method has wide application prospect in natural language processing field tasks such as question and answer systems, knowledge maps, search engines and the like.
In recent years, relationship extraction based on remote supervision is a popular direction of relationship extraction technology based on deep learning. According to the method, a large knowledge base and a corpus are aligned, and a large amount of labeled text data are automatically obtained to train a relation extraction model. Specifically, let' head entity, tail entity, relationship > triple exist in the knowledge base, and sentences containing the head entity and the tail entity in the remote supervision hypothesis corpus all express the triple relationship. Compared with other relation extraction methods, the relation extraction method based on remote supervision uses a large amount of automatically labeled data for training, the problem that relation extraction training data are difficult to obtain can be solved, and the generalization capability of a trained relation model is strong.
However, remotely supervising automatically annotated data sets presents two serious problems: the first is the noise problem, namely, most sentences in the labeled data set are labeled wrongly, and the training effect of the relation extraction model is influenced. And the second problem is the long tail relation problem, namely, a few relations occupy most data, so that the relation extraction model cannot fully learn the long tail relation. However, the existing extraction method based on remote supervision relationship is difficult to solve the two problems of the remote supervision data set, or only can concentrate on solving one of the problems, so that effective training of the model cannot be realized, the actual effect of relationship extraction is poor, and mining of deep semantic features of the text is seriously influenced.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and creatively provides a remote supervision relation extraction method fusing knowledge and a constraint graph in order to effectively solve the problems of data noise and long tail in remote supervision relation extraction.
The method has the innovation points that: the method has the advantages that extra information supplement is carried out by using the entity knowledge context, information transfer between relations is carried out by using the entity type and the relation constraint graph, a multi-source fusion attention mechanism is adopted to carry out information fusion on the sentence semantic information, the entity context information and the entity relation constraint information, expression learning of sentences and entity relations is facilitated, and the relation extraction effect is improved. The method can effectively identify the relationship between the entity pairs for the unstructured text marked with the entity information, and is particularly effective for extracting the relationship under large-scale text data and complex text environment and extracting the structured fact information from the unstructured text.
The invention is realized by adopting the following technical scheme.
A remote supervision relation extraction method fusing knowledge and constraint graphs comprises the following steps:
step 1: and collecting the neighbor entities of the entities in the remote supervision data set in the knowledge base, wherein the neighbor entities comprise one-hop neighbor entities and two-hop neighbor entities. An entity set is formed by entities in a remote supervision data set and neighbor entities thereof, an entity neighbor graph is constructed by using the entity set, and a constraint graph is constructed by combining a relationship set between the entities. For remote surveillance data sets, sentences with identical entity pairs are combined into sentence packets.
Wherein, for entities in the remote supervision dataset, a Flair named entity recognition tool can be used to recognize that its entity type constitutes the entity type set.
And 2, step: a word embedding vector for each word in a sentence in the package is obtained, as well as a feature vector representation of the sentence.
Specifically, for the sentence packet obtained in step 1, the word embedding vector of each word in the sentence is obtained for the sentence in the packet through a word2vec tool, and the feature vector representation of the sentence is obtained by taking a segmented convolutional neural network (PCNN) as a sentence encoder.
And step 3: and collecting entity attribute information of each entity in the entity set in a knowledge base by using an attribute encoder, wherein the entity attribute information comprises an entity name, an entity alias, an entity type and an entity description. And each entity obtains the attribute vector of the corresponding entity by splicing the attribute information and inputting the attribute information into an attribute encoder, then outputting a matrix and adopting column vector averaging.
In particular, BERT may be used as the attribute encoder.
And 4, step 4: and constructing an adjacency matrix by using the entity neighbor graph, and obtaining knowledge context vector representation of the target entity by using the adjacency matrix and the entity attribute vector as input and a neighbor graph encoder constructed by a graph convolution neural network.
And 5: and constructing an adjacency matrix by using the constraint graph, and obtaining vector representations of the entity types and the relationships by using the adjacency matrix, the entity types and the vector representations of the relationships as input through a constraint graph encoder constructed by a graph convolution neural network.
Step 6: and taking the sentence characteristic vector representation, the entity context vector representation and the vector representation of the entity type and the relation as input, and calculating to obtain the characteristic vector representation of the sentence packet through a multi-source fusion attention mechanism.
And 7: and expressing the feature vector of the sentence packet, and predicting the relation label of the sentence packet through a relation classifier.
Further, in step 1, the knowledge base includes a relationship between an entity pair and an entity pair, and each entity has attribute information such as its entity name, entity alias, entity type, and entity description. The remote supervision data set is a training corpus marked by a remote supervision method, and the natural language text is marked by utilizing the entity pairs in the knowledge base and the corresponding relation. Specifically, assuming that a "< head entity, tail entity, relationship >" triple exists in the knowledge base, any sentence containing the head entity and the tail entity is considered to express the triple relationship, thereby obtaining the annotation data.
Further, in step 1, the entity type set can use 18 entity types defined in ontotinotes 5.0 and one "Other" special entity type, for a total of 19 entity types, and use Flair named entity recognition tool to identify the types of entities in the dataset.
Further, in step 2, the segmented convolutional neural network (PCNN) is a neural network model that takes the feature vector sequence of the sentence word as input, and generates a sentence feature vector representation by convolution and segmentation pooling based on the positions of two entities in the sentence.
Further, in step 3, BERT is a deep neural network constructed based on a multi-head self-attention mechanism and pre-trained using a large number of corpora, and can output a feature vector representation with semantic information for an input text.
Further, in step 4 and step 5, the graph convolution neural network is a neural network that performs information propagation and aggregation on graph data through convolution operation, thereby extracting feature information.
Further, in step 6, the multi-source fusion attention mechanism is a technical scheme of information fusion based on the attention mechanism, which is firstly proposed by the present invention, and the semantic information of sentences, the entity knowledge context and the constraint information of entity relations are fused to obtain the feature vector representation of sentence packets.
Further, in step 7, the relationship classifier is implemented as: and (3) representing the feature vector of the sentence packet to the feature representation of each relation in the input and relation set, calculating a prediction score through dot product operation, and then calculating the probability of classifying the sentence packet into a certain relation through softmax operation.
Advantageous effects
Compared with the prior art, the invention has the following advantages:
first, the method of the present invention, by integrating the information of the entity in the knowledge base into the relationship extraction model as the knowledge context supplement of the entity, helps the relationship extraction model to judge whether the remote supervision annotation is correct, and effectively reduces the data noise in the remote supervision relationship extraction.
Secondly, the method of the invention realizes indirect connection of different relationships through entity types by using the constraint graphs of the entity types and the relationships among the entities, helps information to be transmitted among the relationships, and solves the problem of long relationship tail of remote supervision relationship extraction.
Thirdly, the method of the invention uses a multi-source fusion attention mode to fuse the entity knowledge context and the constraint graph information with the sentence information in the packet, and the obtained packet feature vector expression can better embody the relation of the target entity pair, and simultaneously solves the data noise problem and the relation long tail problem of remote supervision relation extraction.
Drawings
FIG. 1 is a general framework schematic of the process of the present invention.
Detailed Description
The method of the present invention will be described in further detail with reference to the accompanying drawings.
A remote supervision relation extraction method fusing knowledge and constraint graphs is shown in figure 1 and comprises the following steps:
step 1: and collecting one-hop or two-hop neighbor entities of the entities in the remote supervision data set in the knowledge base, wherein the entities in the remote supervision data set and the neighbor entities thereof form an entity set, and the entity set is used for constructing an entity neighbor graph. And identifying the entity types of the entities by using a Flair named entity identification tool to form an entity type set according to the entities in the remote supervision data set, and constructing a constraint graph by combining the relationship set among the entities. For remote supervisory data sets, sentences having the same entity pairs are combined into sentence packets.
Specifically, an entity neighbor graph is defined as a graph K = { E, N }, where E represents a set of entity nodes, that is, a set of entities; n represents a set of edges. If two entities E in the set E 1 、e 2 While appearing in a triple in the knowledge base, there is an edge (e) 1 ,e 2 )∈N。
The constraint graph is defined as a graph G = { T, R, C }, where T is a set of entity type nodes, 18 coarse entity types defined in ontotonotes 5.0 may be used, and for entity types not belonging to these 18 types, a special node "other" is used to represent, so that the set T includes 19 entity type nodes. The Flair named entity recognition tool is used to identify the type of entity in the dataset.
Let R be the set of relationship nodes formed by all relationships and C be the set of constraint edges, if entity e 1 、e 2 Of the entity typeAnd entity e 1 、e 2 Having a relation r, then there is a constraintEach constraintCorrespond toAndtwo sides.
And 2, step: for the sentence packet obtained in the step 1, the word embedding vector of each word in the sentence can be obtained by a word2vec tool in the sentence packet, and the feature vector representation of the sentence can be obtained by taking a segmented convolutional neural network (PCNN) as a sentence encoder.
In particular, for in-package sentencesn s For the length of the sentence s, each word w i The input of e s consists of its own word embedding vector and location feature vector, where the word embedding vector v i Can be obtained by word2vec tool pre-training, and the vector dimension is d w (ii) a Location feature vectorIs the word w i With the target entity in the sentence (e) h ,e t ) Is embedded in a vector of two relative distances, the vector dimension being d p . And the relative distance takes the position of the entity pair appearing for the first time in the sentence as a reference position to calculate the relative distance of other words.
Word w is obtained by concatenation i Is input to represent w i ,d=d w +2d p ,A vector is represented. Input representation w i As shown in formula 1:
wherein, "; "denotes a vector stitching operation. The input of the sentence s is represented as a matrix
And (3) encoding an input representation matrix X of the sentence s by using a segmented convolutional neural network (PCNN) to obtain a sentence feature vector with a fixed dimension.
Wherein the segmented convolutional neural network packetIncluding convolutional layers and segmented max-pooling layers. Wherein the parameter matrix W of the convolutional layer is represented as:w represents the length of the convolution sliding window, sub-matrix q of matrix X under the mth sliding window m As shown in equation 2:
q m =X m-w+1:m (1≤m≤l s +w-1) (2)
wherein l s The length of a sentence s is represented, and m-w +1 represents an index interval of the word sequence in all the word sequences of the original sentence under the sliding window.
Then the sub-matrix q m Parameter matrix c with convolution kernel m As shown in equation 3:
Specifically, in the convolution process, sliding convolution is carried out with the stride of 1, zero vector filling is carried out on the part of a convolution window exceeding the sentence boundary, finally, a characteristic vector c representing a matrix X is obtained,
in practical use, in order to capture different characteristics of a sentence, a plurality of convolution checks are usually used to perform convolution on a sentence expression matrix, and d is adopted in the invention c A set of convolution kernels represented asAfter convolution calculation, the expression matrix X corresponds to d c A feature vector
The segmented convolutional neural network segments the maximum pooling layer, takes the position of an entity pair in a sentence s as a segmentation point, can segment the feature vector into 3 parts, and then applies maximum pooling operation to each part respectively. Specifically, for any one feature vectorAfter segmentation, 3 eigenvectors are generated: { c i,1 ;c i,2 ;c i,3 }. Performing maximum pooling on each feature sub-vector to obtain a pooled feature vector f with a dimensionality of 3 i As shown in formula 4:
f i =[max(c i,1 );max(c i,2 );max(c i,3 )] (4)
where max (·) represents the max operation.
Matrix X corresponds to d c A feature vectorRespectively after the segmented maximum pooling layers, splicing the obtained pooled feature vectors, and obtaining the feature vector representation of X after an activation function tanh (-) is carried out As shown in equation 5:
wherein,denotes d c Pooling feature vectors. Thus obtainedI.e. a feature vector representation of the sentence s.
And step 3: and (2) collecting entity name, entity alias, entity type and entity description four entity attribute information in a knowledge base for each entity in the entity set in the step 1 by using BERT as an attribute encoder. And each entity inputs the attribute information into BERT by splicing, then outputs a matrix, and obtains the attribute vector of the corresponding entity by adopting column vector averaging.
Specifically, for each entity e in the entity set i E, the obtained corresponding attribute vectord a Representing the dimensions of the entity attribute vector. Entity e i Attribute vector ofThe calculation method is as follows:
wherein Mean (-) represents the column averaging operation; BERT (-) represents an output matrix calculated using a BERT pre-training model; name i Representing an entity e i The entity name of (1); alias i Representing an entity e i The entity of (2) is named differently; type i Representing an entity e i Entity types in the knowledge base; description of i Representing an entity e i The entity description of (1); with symbols [ SEP ] between attribute information]Spacing; [ CLS]A start identifier to be added for BERT input.
And 4, step 4: and (3) constructing an adjacent matrix by using the entity neighbor graph in the step (1), and obtaining the knowledge context vector representation of the target entity by using the adjacent matrix and the entity attribute vector obtained in the step (3) as input through a neighbor graph encoder constructed by a graph convolution neural network.
The target entity may be any entity in the entity set in step 1. For the entity neighbor graph K = { E, N }, its set of entity nodes is E, set of edges is N, its adjacency matrix isIs calculated as shown in equation 7:
wherein | E | represents the number of physical nodes; v. of i 、v j E represents any two entity nodes in the entity node set; (v) of i ,v j ) E N indicates that there is a v in the entity neighbor graph i 、v j Edges between physical nodes.
Selecting a two-layer graph convolution neural network as a neighbor graph encoder, and carrying out the entity attribute vector obtained in the step 3As entity e in entity neighbor graph i E initial vector of E, for node v i Output of E at kth graph convolution layerThe calculation method is shown in formula 8:
wherein Relu (-) is a nonlinear activation function, | E | represents the number of physical nodes, W (k) Weight matrix representing the k-th layer of the neighbor graph encoder, b (k) The bias term for the k-th layer is represented,a feature vector representing the k-1 level output of the jth physical node,
selecting the output of the last layer of the neighbor graph encoder as an entity e i E knowledge contextRepresenting a vectord n Representing the output vector dimension of the last layer of the neighbor graph encoder.
For an entity node set E of an entity neighbor graph K, a knowledge context matrix is obtained after passing through a neighbor graph encoderEach row vector in the matrix K corresponds to knowledge context vector representation of an entity, and the knowledge context vector representation fuses entity neighbors and attribute information thereof.
And 5: and (2) constructing an adjacency matrix by using the constraint graph in the step (1), and using the adjacency matrix, the entity type and the vector representation of the relationship as input to a constraint graph encoder constructed by a graph convolution neural network to obtain the vector representation of the entity type and the relationship.
Specifically, for a constraint graph G = { T, R, C }, the nodes in the graph include entity type nodes and relationship nodes, and the set of nodes V of the constraint graph G = T ≡ R. Define the edge set of the constraint graph G as D G For each constraintD G Middle correspond toAndtwo sides. Adjacency matrix of constraint graph G The calculation formula of (a) is shown in formula 9:
wherein, | V G I represents the number of nodes of the constraint graph; v. of i 、v j ∈V G Any two nodes in the constraint graph node set are selected; (v) i ,v j )∈D G Indicating the presence of a v in the constraint graph i 、v j Edges between nodes.
Selecting a two-layer graph convolutional neural network as a constraint graph encoder, and collecting nodes V of the constraint graph G Performing a random initialization operation on the initial input vector of each node in the set, for the node v i ∈V G The output of the k-th graph convolution layerThe calculation method is shown in formula 10:
wherein Relu (. Cndot.) is a nonlinear activation function, | V G I represents the number of nodes of the constraint graph, M (k) Weight matrix, q, representing the k layer of the constraint graph encoder (k) The bias term for the k-th layer is represented,representing the feature vector of the k-1 level output of the jth node.
Selecting the output of the last layer of the constraint graph encoder as a node v i ∈V G Is represented by the feature vector of (1). Set of nodes V for constraint graph G G Obtaining a feature matrix after constraint graph encoderd g Representing the output vector dimensions of the last layer of the constraint graph coder. Feature matrix V G Each row vector in (b) corresponds to a node set V of the constraint graph G G Vector representation of one node.
Set of nodes V G Including entity type nodes and relationship nodes. By aligning the feature matrix V G Splitting is carried out to obtain a characteristic matrix of the entity type node set TAnd feature matrix of the set of relational nodes R|n t I represents the number of entity types in the entity type set T, and n r And | represents the number of relationships in the set of relationship nodes R.
And 6: and (4) expressing the sentence characteristic vector obtained in the step (2), expressing the entity context vector obtained in the step (4) and expressing the entity type and relationship vector obtained in the step (5) as input, and calculating to obtain the characteristic vector expression of the sentence packet through a multi-source fusion attention mechanism.
In particular, for sentence packetsn b The sentence number of the sentence packet B corresponds to the target entity pair of (e) h ,e t ) And the corresponding relation label is R belongs to R.
For each sentence s in sentence packet B i E.g. B, and obtaining a characteristic vector s of the sentence by a sentence coder i (ii) a For the target entity pair (e) corresponding to the sentence packet h ,e t ) The entity e is obtained by an entity attribute encoder and a neighbor graph encoder h And e t Corresponding knowledge context vector representationTarget entity pair for sentence packet (e) h ,e t ) And a relation label r, obtaining a corresponding entity type vector through a constraint graph encoderAnd a relationship vector R ∈ R'.
Multi-source fusion attention machineAnd adding the knowledge context vector and the entity type vector of the entity into the feature vectors of the sentences and the relations, thereby providing more knowledge background information and constraint information for relation extraction. In particular, the sentence s i Final vector representation of e BBy concatenating sentence feature vectors s i Entity type vector of target entity pairAnd knowledge context vector of target entity pairObtained as shown in formula 11:
wherein,d s =3d c +2d g +2d n vector s i Has a dimension of 3d c Vector of motionHas a dimension of d g Vector of motionHas a dimension of d n (ii) a "; "denotes a vector stitching operation.
The final vector representation R of the relation R ∈ R f The calculation method is as follows:
where Linear (·) denotes a Linear fully-connected layer, the purpose of which is to align the vector r f Sum vectorThe vector dimension of (2).
"; "denotes a vector stitching operation.
In obtaining a sentence s i Final vector representation of e BAnd a final vector representation r of the relationship label r f Thereafter, the feature vector representation of sentence packet B is obtained through the attention mechanismThe calculation is shown in equation 13:
wherein n is b Number of sentences, alpha, for sentence packet B i Representing a sentence s i Final vector representation of e BIs calculated as shown in equation 14:
wherein e is i Representing sentence vectorsThe matching score with the relation label r of the sentence packet is calculated as shown in the formula 15:
wherein "·" denotes a vector dot product operation.
And 7: and (4) expressing the feature vector of the sentence packet obtained in the step (6), and predicting the relation label of the sentence packet through a relation classifier.
After the feature vector B of the packet B is obtained through a multi-source fusion attention mechanism, the relational classifier predicts the relational label of the sentence packet B. Sentence packet B with each relationship R in the set of relationships R i The calculation formula of the prediction score of (2) is shown in equation 17:
wherein,represents the relation r i Is represented by the final vector of (a),representing a bias term.
When each relation R in the sentence sub-packet B and the relation set R is obtained i After the prediction score of (2), the packet B is classified into the relation r by calculating the softmax function i Is calculated as shown in equation 17:
wherein, | n r L represents the number of relationships in the set of relationships R,indicating the jth relation R between the sentence packet B and the relation set R j Theta is a parameter of the relational extraction model.
Further, the relationship extraction model in the invention is trained by using a cross entropy loss function as a loss function, and a calculation formula is shown in formula 18:
wherein n is B Indicating the number of sentence bundles in the training set used,sentence display packet B i θ is a parameter of the relationship extraction model used in the present invention, and includes all the parameters in the above steps: parameters of the sentence encoder in step 2, parameters of the attribute encoder in step 3, parameters of the entity neighbor graph encoder in step 4, parameters of the constraint graph encoder in step 5, parameters of the multi-source fusion attention mechanism in step 6, and parameters of the relationship classifier in step 7. The parameters include a weight matrix and bias terms.
In the invention, the relation extraction model used by the relation extraction method can use a small batch of random gradient descent optimization algorithm SGD minimum loss function J (theta), thereby better optimizing parameters.
Claims (5)
1. A remote supervision relation extraction method fusing knowledge and constraint graphs is characterized by comprising the following steps:
step 1: collecting neighbor entities of entities in the remote supervision data set in a knowledge base, wherein the neighbor entities comprise one-hop neighbor entities and two-hop neighbor entities; an entity set is formed by entities in a remote supervision data set and neighbor entities thereof, an entity neighbor graph is constructed by using the entity set, and a constraint graph is constructed by combining a relationship set between the entities; for the remote supervision data set, combining sentences with the same entity pairs into a sentence sub-packet;
step 2: acquiring a word embedding vector of each word in a sentence in the packet and a feature vector representation of the sentence;
and 3, step 3: collecting entity attribute information of each entity in the entity set in a knowledge base by using an attribute encoder, wherein the entity attribute information comprises an entity name, an entity type and an entity description; each entity obtains the attribute vector of the corresponding entity by splicing the attribute information and inputting the attribute information into an attribute encoder, then outputting a matrix and carrying out column vector equalization;
and 4, step 4: constructing an adjacent matrix by using an entity neighbor graph, and obtaining knowledge context vector representation of a target entity by using the adjacent matrix and an entity attribute vector as input and a neighbor graph encoder constructed by a graph convolution neural network;
and 5: constructing an adjacency matrix by using a constraint graph, and obtaining vector representations of entity types and relationships by using the adjacency matrix, entity types and vector representations of relationships as input through a constraint graph encoder constructed by a graph convolution neural network;
step 6: taking sentence characteristic vector representation, entity context vector representation, entity type and relation vector representation as input, and calculating to obtain characteristic vector representation of a sentence packet through a multi-source fusion attention mechanism;
and 7: and expressing the feature vector of the sentence packet, and predicting the relation label of the sentence packet through a relation classifier.
2. The method for extracting remote supervision relationship based on fusion knowledge and constraint graph as claimed in claim 1, wherein in step 1, the knowledge base contains the relationship between entity pair and entity pair, and the attribute information of each entity: entity name, entity alias, entity type, and entity description;
the remote supervision data set is a training corpus marked by a remote supervision method, the entity pairs and the corresponding relations in the knowledge base are used for marking natural language texts, a triple of < head entity, tail entity and relation >' exists in the knowledge base, and any sentence containing the head entity and the tail entity is determined to express the triple relation, so that marked data are obtained.
3. The method as claimed in claim 1, wherein in step 2, the word embedding vector of each word in the sentence is obtained from the sentence in the package through a word2vec tool, and the feature vector representation of the sentence is obtained by using a segmented convolutional neural network as a sentence encoder, wherein the segmented convolutional neural network is a neural network model which takes the feature vector sequence of the words in the sentence as input and generates the sentence feature vector representation by convolution and segmentation pooling based on the positions of two entities in the sentence.
4. A method for extracting remote supervised relationship fusing knowledge and constraint graphs as recited in claim 1, wherein in step 1, an entity neighbor graph is defined as a graph K = { E, N }, where E represents a set of entity nodes, that is, a set of entities; n represents a set of edges; if two entities E in the set E 1 、e 2 While appearing in a triple in the knowledge base, there is an edge (e) 1 ,e 2 )∈N;
Defining a constraint graph as a graph G = { T, R, C }, wherein T is an entity type node set, and identifying the type of an entity in a data set by using a Flair named entity identification tool;
let R be the set of relationship nodes formed by all relationships and C be the set of constraint edges, if entity e 1 、e 2 The entity type ofAnd entity e 1 、e 2 Having the relationship r, then there is a constraintEach constraintCorrespond toAndtwo edges;
in step 2, for the sentence packet obtained in step 1, for the sentences in the packetn s For the length of sentence s, each word w i The input of e s consists of its own word embedding vector and location feature vector, where the word embedding vector v i Let the vector dimension be d w (ii) a Location feature vectorIs the word w i With target entity pair in sentence (e) h ,e t ) Is embedded in a vector representation of two relative distances, the vector dimension being d p (ii) a The relative distance takes the position of the entity pair appearing for the first time in the sentence as a reference position to calculate the relative distance of other words;
word w is obtained by concatenation i Is input to represent w i ,d=d w +2d p ,Representing a vector; input representation w i As shown in formula 1:
wherein, "; "represents a vector stitching operation; the input of the sentence s is represented as a matrix
Using a segmented convolutional neural network to encode an input representation matrix X of a sentence s to obtain a sentence characteristic vector with a fixed dimension; the segmented convolutional neural network comprises a convolutional layer and a segmented maximum pooling layer; wherein the parameter matrix W of the convolutional layer is represented as:w denotes the length of the convolution sliding window, sub-matrix q of matrix X under the mth sliding window m As shown in equation 2:
q m =X m-w+1:m (1≤m≤l s +w-1) (2)
wherein l s The length of a sentence s is represented, and m-w +1 represents an index interval of all word sequences of the original sentence in the word sequence under the sliding window;
then the submatrix q m Parameter matrix c with convolution kernel m Is shown in formula 3:
in the convolution process, sliding convolution is carried out with the stride of 1, the part of a convolution window exceeding the sentence boundary is filled with zero vectors, finally a characteristic vector c representing a matrix X is obtained,
by d c A set of convolution kernels represented asAfter convolution calculation, the expression matrix X corresponds to d c A feature vector
Segmenting a maximal pooling layer of a segmented convolutional neural network, taking the position of an entity pair in a sentence s as a segmentation point, segmenting the feature vector into 3 parts, and then respectively applying maximal pooling operation to each part; for any one feature vectorAfter segmentation, 3 eigenvectors are generated: { c i,1 ;c i,2 ;c i,3 }; performing maximum pooling on each feature sub-vector to obtain a pooled feature vector f with a dimensionality of 3 i As shown in formula 4:
f i =[max(c i,1 );max(c i,2 );max(c i,3 )] (4)
where max (·) denotes a max operation;
matrix X corresponds to d c A feature vectorRespectively after the segmented maximum pooling layers, splicing the obtained pooled feature vectors, and obtaining the feature vector representation of X after an activation function tanh (-) is carried out As shown in equation 5:
wherein,denotes d c Pooling feature vectors; thus obtainedNamely representing the feature vector of the sentence s;
in step 3, using BERT as an attribute encoder, for each entity in the entity set in step 1, collecting four entity attribute information of entity name, entity alias, entity type and entity description in a knowledge base; each entity inputs the attribute information into BERT by splicing, then outputs a matrix, and obtains the attribute vector of the corresponding entity by adopting column vector averaging;
for each entity e in the entity set i E, the obtained corresponding attribute vectord a A dimension representing an entity attribute vector; entity e i Attribute vector ofThe calculation method is as follows:
wherein Mean (-) represents the column averaging operation; BERT (-) represents an output matrix calculated by using a BERT pre-training model; name i Representing an entity e i The entity name of (1); alias (Alias) i Representing an entity e i The entity of (1) is named separately; type i Representing an entity e i An entity type in the knowledge base; description of i Representing an entity e i The entity description of (1); by symbol [ SEP ] between attribute information]Spacing; [ CLS]The initial identifier is added when the BERT is input;
in step 4, constructing an adjacent matrix by using the entity neighbor graph in the step 1, using the adjacent matrix and the entity attribute vector obtained in the step 3 as input, and obtaining knowledge context vector representation of a target entity by using a neighbor graph encoder constructed by a graph convolution neural network;
wherein, the target entity is any entity in the entity set in the step 1; for the entity neighbor graph K = { E, N }, its set of entity nodes is E, and the set of edges isN, its adjacent matrixIs calculated as shown in equation 7:
wherein | E | represents the number of physical nodes; v. of i 、v j E represents any two entity nodes in the entity node set; (v) i ,v j ) E N indicates that there is a v in the entity neighbor graph i 、v j Edges between the physical nodes;
selecting a two-layer graph convolutional neural network as a neighbor graph encoder, and using the entity attribute vector obtained in the step 3As entity e in an entity neighbor graph i E initial vector of E, for node v i E is the output of the k-th graph convolution layerThe calculation method is shown in formula 8:
wherein Relu (-) is a nonlinear activation function, | E | represents the number of entity nodes, W (k) Weight matrix representing the k-th layer of the neighbor graph encoder, b (k) A bias term representing the k-th layer,a feature vector representing the k-1 layer output of the j-th physical node,
selecting the output of the last layer of the neighbor graph encoder as an entity e i Knowledge context representation vector of E d n Representing the output vector dimension of the last layer of the neighbor graph encoder;
for an entity node set E of an entity neighbor graph K, a knowledge context matrix is obtained through a neighbor graph encoderEach row vector in the matrix K corresponds to knowledge context vector representation of an entity, and the knowledge context vector representation integrates entity neighbors and attribute information thereof;
in step 5, constructing an adjacency matrix by using the constraint graph in the step 1, and obtaining vector representations of entity types and relationships by using the adjacency matrix, the vector representations of entity types and relationships as input through a constraint graph encoder constructed by a graph convolution neural network;
for a constraint graph G = { T, R, C }, nodes in the graph comprise entity type nodes and relationship nodes, and a node set V of the constraint graph G = T ═ u ≡ R; define the edge set of the constraint graph G as D G For each constraintD G Middle correspondenceAndtwo edges; adjacency matrix of constraint graph G The calculation formula of (a) is shown in formula 9:
wherein, | V G I represents the number of nodes of the constraint graph; v. of i 、v j ∈V G Any two nodes in the constraint graph node set are selected; (v) i ,v j )∈D G Indicating the existence of a v in the constraint graph i 、v j Edges between nodes;
selecting a two-layer graph convolutional neural network as a constraint graph encoder, and collecting nodes V of the constraint graph G Performing a random initialization operation on the initial input vector of each node in the set, for the node v i ∈V G The output of the k-th graph convolution layerThe calculation method is shown in formula 10:
wherein Relu (. Cndot.) is a nonlinear activation function, | V G I represents the number of nodes of the constraint graph, M (k) Weight matrix, q, representing the k layer of the constraint graph encoder (k) The bias term for the k-th layer is represented,a feature vector representing the k-1 layer output of the jth node;
selecting the output of the last layer of the constraint graph encoder as a node v i ∈V G A feature vector representation of; set of nodes V for constraint graph G G Obtaining a value after passing through the constraint map encoderIndividual feature matrixd g Representing the output vector dimension of the last layer of the constraint graph encoder; feature matrix V G Each row vector in (b) corresponds to a node set V of the constraint graph G G A vector representation of the one node;
set of nodes V G The method comprises entity type nodes and relationship nodes; by aligning feature matrix V G Splitting to obtain a characteristic matrix of the entity type node set TAnd feature matrix of the set of relational nodes R|n t I represents the number of entity types in the entity type set T, and n r L represents the number of relationships in the relationship node set R;
in step 6, the sentence feature vector obtained in step 2, the entity context vector obtained in step 4, and the vector representation of the entity type and the relationship obtained in step 5 are used as input, and feature vector representation of the sentence sub-packet is obtained through calculation by a multi-source fusion attention mechanism;
for sentence packetsn b The sentence number of the sentence packet B corresponds to the target entity pair of (e) h ,e t ) The corresponding relation label is R epsilon R;
for each sentence s in sentence packet B i E.g. B, and obtaining a characteristic vector s of the sentence by a sentence coder i (ii) a For the target entity pair (e) corresponding to the sentence packet h ,e t ) The entity e is obtained by an entity attribute encoder and a neighbor graph encoder h And e t Corresponding knowledge context vector representation Target entity pair for sentence packet (e) h ,e t ) And a relation label r, obtaining a corresponding entity type vector through a constraint graph encoder And the relation vector R belongs to R';
the multi-source fusion attention mechanism is characterized in that a knowledge context vector and an entity type vector of an entity are added into a sentence and a feature vector of a relation, so that more knowledge background information and constraint information are provided for relation extraction; sentence s i Final vector representation of e BBy concatenating sentence feature vectors s i Entity type vector of target entity pairAnd knowledge context vector of target entity pairObtained as shown in formula 11:
wherein,d s =3d c +2d g +2d n vector s i Dimension of 3d c Vector of motionHas a dimension of d g Vector of motionHas a dimension of d n (ii) a "; "represents a vector splicing operation;
the final vector representation R of the relation R ∈ R f The calculation method is as follows:
where Linear (·) denotes a Linear fully-connected layer, the purpose of which is to align the vector r f Sum vectorThe vector dimension of (a); "; "represents a vector stitching operation;
in obtaining a sentence s i Final vector representation of e BAnd the final vector representation r of the relationship label r f Thereafter, the feature vector representation of sentence packet B is obtained through the attention mechanismThe calculation is shown in equation 13:
wherein n is b Number of sentences, alpha, for sentence packet B i Representing a sentence s i Final vector representation of e BIs calculated as shown in equation 14:
wherein e is i Representing sentence vectorsThe matching score with the relation label r of the sentence packet is calculated as shown in the formula 15:
wherein, "·" represents a vector dot product operation;
in step 7, the relation labels of the sentence packets are predicted through a relation classifier according to the feature vector representation of the sentence packets obtained in step 6;
after a feature vector B of a packet B is obtained through a multi-source fusion attention mechanism, a relation classifier predicts a relation label of the sentence packet B; sentence packet B with each relationship R in the set of relationships R i The calculation formula of the prediction score of (2) is shown in equation 17:
wherein,represents the relation r i Is represented by the final vector of (a),representing a bias term;
when each relation R in the sentence sub-packet B and the relation set R is obtained i After predicting the score, the packet is calculated by the softmax functionB is classified as a relation r i The probability P of (c) is calculated as shown in equation 17:
5. The method for extracting remote supervision relationship based on the fusion knowledge and constraint graph as claimed in claim 4, wherein the relationship extraction model is trained by using the cross entropy loss function as the loss function, and the calculation formula is shown in equation 18:
wherein n is B Indicating the number of sentence packets in the training set used,sentence display packet B i θ is a parameter of the relationship extraction model used in the present invention, and includes all parameters in the above steps: parameters of the sentence encoder in step 2, parameters of the attribute encoder in step 3, parameters of the entity neighbor graph encoder in step 4, parameters of the constraint graph encoder in step 5, parameters of the multi-source fusion attention mechanism in step 6, and parameters of the relationship classifier in step 7; the parameters include a weight matrix and bias terms.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211185558.2A CN115545005A (en) | 2022-09-27 | 2022-09-27 | Remote supervision relation extraction method fusing knowledge and constraint graph |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211185558.2A CN115545005A (en) | 2022-09-27 | 2022-09-27 | Remote supervision relation extraction method fusing knowledge and constraint graph |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115545005A true CN115545005A (en) | 2022-12-30 |
Family
ID=84730059
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211185558.2A Pending CN115545005A (en) | 2022-09-27 | 2022-09-27 | Remote supervision relation extraction method fusing knowledge and constraint graph |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115545005A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116737965A (en) * | 2023-08-11 | 2023-09-12 | 深圳市腾讯计算机系统有限公司 | Information acquisition method and device, electronic equipment and storage medium |
-
2022
- 2022-09-27 CN CN202211185558.2A patent/CN115545005A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116737965A (en) * | 2023-08-11 | 2023-09-12 | 深圳市腾讯计算机系统有限公司 | Information acquisition method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108959252B (en) | Semi-supervised Chinese named entity recognition method based on deep learning | |
CN110969020B (en) | CNN and attention mechanism-based Chinese named entity identification method, system and medium | |
CN111753024B (en) | Multi-source heterogeneous data entity alignment method oriented to public safety field | |
CN114021584B (en) | Knowledge representation learning method based on graph convolution network and translation model | |
CN113128229A (en) | Chinese entity relation joint extraction method | |
CN111881677A (en) | Address matching algorithm based on deep learning model | |
CN109165275B (en) | Intelligent substation operation ticket information intelligent search matching method based on deep learning | |
CN111274804A (en) | Case information extraction method based on named entity recognition | |
CN107679110A (en) | The method and device of knowledge mapping is improved with reference to text classification and picture attribute extraction | |
CN112733866A (en) | Network construction method for improving text description correctness of controllable image | |
CN113407660B (en) | Unstructured text event extraction method | |
CN109960728A (en) | A kind of open field conferencing information name entity recognition method and system | |
CN112966525B (en) | Law field event extraction method based on pre-training model and convolutional neural network algorithm | |
CN110555084A (en) | remote supervision relation classification method based on PCNN and multi-layer attention | |
CN111259197B (en) | Video description generation method based on pre-coding semantic features | |
CN112069825B (en) | Entity relation joint extraction method for alert condition record data | |
CN111460097A (en) | Small sample text classification method based on TPN | |
CN112989833A (en) | Remote supervision entity relationship joint extraction method and system based on multilayer LSTM | |
CN114048314B (en) | Natural language steganalysis method | |
CN115545005A (en) | Remote supervision relation extraction method fusing knowledge and constraint graph | |
CN114841151A (en) | Medical text entity relation joint extraction method based on decomposition-recombination strategy | |
CN114925205A (en) | GCN-GRU text classification method based on comparative learning | |
CN112905793B (en) | Case recommendation method and system based on bilstm+attention text classification | |
CN114528368A (en) | Spatial relationship extraction method based on pre-training language model and text feature fusion | |
CN115186670B (en) | Method and system for identifying domain named entities based on active learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |