CN115545005A

CN115545005A - Remote supervision relation extraction method fusing knowledge and constraint graph

Info

Publication number: CN115545005A
Application number: CN202211185558.2A
Authority: CN
Inventors: 刘琼昕; 牛文涛; 王佳升; 王甜甜; 方胜
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2022-09-27
Filing date: 2022-09-27
Publication date: 2022-12-30

Abstract

The invention discloses a remote supervision relation extraction method fusing knowledge and a constraint graph, and belongs to the technical field of text data relation extraction in computer natural language processing. The method performs additional information supplementation by using entity knowledge context. Information transfer between relations is carried out through entity types and relation constraint graphs, information fusion is carried out on sentence semantic information, entity context information and entity relation constraint information through a multi-source fusion attention mechanism, expression learning of sentences and entity relations is facilitated, and relation extraction effects are improved. The method simultaneously solves the problems of data noise and relationship long tail in remote supervision relationship extraction, is particularly suitable for relationship extraction under large-scale text data and complex text environments, and is very effective for extracting structured fact information from unstructured texts.

Description

Remote supervision relation extraction method fusing knowledge and constraint graph

Technical Field

The invention relates to a remote supervision relation extraction method fusing knowledge and a constraint graph, and belongs to the technical field of text data relation extraction in computer natural language processing.

Background

Natural Language Processing (NLP) is an important direction in the field of computers and artificial intelligence, and mainly realizes various theories and methods for effective communication between people and computers by using Natural Language.

With the development of artificial intelligence technology, in computer natural language processing, relation extraction based on machine learning and deep learning becomes a hotspot in the field. At present, deep semantic features of language texts are automatically mined by using large-scale text corpus training relation extraction, intelligent semantic analysis and relation extraction are realized, the limitation of a relation extraction method by using traditional manual design rules is broken through, and the method becomes a main means of a relation extraction technology. The relation extraction under large-scale text data and complex text environment is automatically carried out through deep learning, the extraction of structured fact information from unstructured text is realized, and the method has wide application prospect in natural language processing field tasks such as question and answer systems, knowledge maps, search engines and the like.

In recent years, relationship extraction based on remote supervision is a popular direction of relationship extraction technology based on deep learning. According to the method, a large knowledge base and a corpus are aligned, and a large amount of labeled text data are automatically obtained to train a relation extraction model. Specifically, let' head entity, tail entity, relationship > triple exist in the knowledge base, and sentences containing the head entity and the tail entity in the remote supervision hypothesis corpus all express the triple relationship. Compared with other relation extraction methods, the relation extraction method based on remote supervision uses a large amount of automatically labeled data for training, the problem that relation extraction training data are difficult to obtain can be solved, and the generalization capability of a trained relation model is strong.

However, remotely supervising automatically annotated data sets presents two serious problems: the first is the noise problem, namely, most sentences in the labeled data set are labeled wrongly, and the training effect of the relation extraction model is influenced. And the second problem is the long tail relation problem, namely, a few relations occupy most data, so that the relation extraction model cannot fully learn the long tail relation. However, the existing extraction method based on remote supervision relationship is difficult to solve the two problems of the remote supervision data set, or only can concentrate on solving one of the problems, so that effective training of the model cannot be realized, the actual effect of relationship extraction is poor, and mining of deep semantic features of the text is seriously influenced.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and creatively provides a remote supervision relation extraction method fusing knowledge and a constraint graph in order to effectively solve the problems of data noise and long tail in remote supervision relation extraction.

The method has the innovation points that: the method has the advantages that extra information supplement is carried out by using the entity knowledge context, information transfer between relations is carried out by using the entity type and the relation constraint graph, a multi-source fusion attention mechanism is adopted to carry out information fusion on the sentence semantic information, the entity context information and the entity relation constraint information, expression learning of sentences and entity relations is facilitated, and the relation extraction effect is improved. The method can effectively identify the relationship between the entity pairs for the unstructured text marked with the entity information, and is particularly effective for extracting the relationship under large-scale text data and complex text environment and extracting the structured fact information from the unstructured text.

The invention is realized by adopting the following technical scheme.

A remote supervision relation extraction method fusing knowledge and constraint graphs comprises the following steps:

step 1: and collecting the neighbor entities of the entities in the remote supervision data set in the knowledge base, wherein the neighbor entities comprise one-hop neighbor entities and two-hop neighbor entities. An entity set is formed by entities in a remote supervision data set and neighbor entities thereof, an entity neighbor graph is constructed by using the entity set, and a constraint graph is constructed by combining a relationship set between the entities. For remote surveillance data sets, sentences with identical entity pairs are combined into sentence packets.

Wherein, for entities in the remote supervision dataset, a Flair named entity recognition tool can be used to recognize that its entity type constitutes the entity type set.

And 2, step: a word embedding vector for each word in a sentence in the package is obtained, as well as a feature vector representation of the sentence.

Specifically, for the sentence packet obtained in step 1, the word embedding vector of each word in the sentence is obtained for the sentence in the packet through a word2vec tool, and the feature vector representation of the sentence is obtained by taking a segmented convolutional neural network (PCNN) as a sentence encoder.

And step 3: and collecting entity attribute information of each entity in the entity set in a knowledge base by using an attribute encoder, wherein the entity attribute information comprises an entity name, an entity alias, an entity type and an entity description. And each entity obtains the attribute vector of the corresponding entity by splicing the attribute information and inputting the attribute information into an attribute encoder, then outputting a matrix and adopting column vector averaging.

In particular, BERT may be used as the attribute encoder.

And 4, step 4: and constructing an adjacency matrix by using the entity neighbor graph, and obtaining knowledge context vector representation of the target entity by using the adjacency matrix and the entity attribute vector as input and a neighbor graph encoder constructed by a graph convolution neural network.

And 5: and constructing an adjacency matrix by using the constraint graph, and obtaining vector representations of the entity types and the relationships by using the adjacency matrix, the entity types and the vector representations of the relationships as input through a constraint graph encoder constructed by a graph convolution neural network.

Step 6: and taking the sentence characteristic vector representation, the entity context vector representation and the vector representation of the entity type and the relation as input, and calculating to obtain the characteristic vector representation of the sentence packet through a multi-source fusion attention mechanism.

And 7: and expressing the feature vector of the sentence packet, and predicting the relation label of the sentence packet through a relation classifier.

Further, in step 1, the knowledge base includes a relationship between an entity pair and an entity pair, and each entity has attribute information such as its entity name, entity alias, entity type, and entity description. The remote supervision data set is a training corpus marked by a remote supervision method, and the natural language text is marked by utilizing the entity pairs in the knowledge base and the corresponding relation. Specifically, assuming that a "< head entity, tail entity, relationship >" triple exists in the knowledge base, any sentence containing the head entity and the tail entity is considered to express the triple relationship, thereby obtaining the annotation data.

Further, in step 1, the entity type set can use 18 entity types defined in ontotinotes 5.0 and one "Other" special entity type, for a total of 19 entity types, and use Flair named entity recognition tool to identify the types of entities in the dataset.

Further, in step 2, the segmented convolutional neural network (PCNN) is a neural network model that takes the feature vector sequence of the sentence word as input, and generates a sentence feature vector representation by convolution and segmentation pooling based on the positions of two entities in the sentence.

Further, in step 3, BERT is a deep neural network constructed based on a multi-head self-attention mechanism and pre-trained using a large number of corpora, and can output a feature vector representation with semantic information for an input text.

Further, in step 4 and step 5, the graph convolution neural network is a neural network that performs information propagation and aggregation on graph data through convolution operation, thereby extracting feature information.

Further, in step 6, the multi-source fusion attention mechanism is a technical scheme of information fusion based on the attention mechanism, which is firstly proposed by the present invention, and the semantic information of sentences, the entity knowledge context and the constraint information of entity relations are fused to obtain the feature vector representation of sentence packets.

Further, in step 7, the relationship classifier is implemented as: and (3) representing the feature vector of the sentence packet to the feature representation of each relation in the input and relation set, calculating a prediction score through dot product operation, and then calculating the probability of classifying the sentence packet into a certain relation through softmax operation.

Advantageous effects

Compared with the prior art, the invention has the following advantages:

first, the method of the present invention, by integrating the information of the entity in the knowledge base into the relationship extraction model as the knowledge context supplement of the entity, helps the relationship extraction model to judge whether the remote supervision annotation is correct, and effectively reduces the data noise in the remote supervision relationship extraction.

Secondly, the method of the invention realizes indirect connection of different relationships through entity types by using the constraint graphs of the entity types and the relationships among the entities, helps information to be transmitted among the relationships, and solves the problem of long relationship tail of remote supervision relationship extraction.

Thirdly, the method of the invention uses a multi-source fusion attention mode to fuse the entity knowledge context and the constraint graph information with the sentence information in the packet, and the obtained packet feature vector expression can better embody the relation of the target entity pair, and simultaneously solves the data noise problem and the relation long tail problem of remote supervision relation extraction.

Drawings

FIG. 1 is a general framework schematic of the process of the present invention.

Detailed Description

The method of the present invention will be described in further detail with reference to the accompanying drawings.

A remote supervision relation extraction method fusing knowledge and constraint graphs is shown in figure 1 and comprises the following steps:

step 1: and collecting one-hop or two-hop neighbor entities of the entities in the remote supervision data set in the knowledge base, wherein the entities in the remote supervision data set and the neighbor entities thereof form an entity set, and the entity set is used for constructing an entity neighbor graph. And identifying the entity types of the entities by using a Flair named entity identification tool to form an entity type set according to the entities in the remote supervision data set, and constructing a constraint graph by combining the relationship set among the entities. For remote supervisory data sets, sentences having the same entity pairs are combined into sentence packets.

Specifically, an entity neighbor graph is defined as a graph K = { E, N }, where E represents a set of entity nodes, that is, a set of entities; n represents a set of edges. If two entities E in the set E ₁ 、e ₂ While appearing in a triple in the knowledge base, there is an edge (e) ₁ ,e ₂ )∈N。

The constraint graph is defined as a graph G = { T, R, C }, where T is a set of entity type nodes, 18 coarse entity types defined in ontotonotes 5.0 may be used, and for entity types not belonging to these 18 types, a special node "other" is used to represent, so that the set T includes 19 entity type nodes. The Flair named entity recognition tool is used to identify the type of entity in the dataset.

Let R be the set of relationship nodes formed by all relationships and C be the set of constraint edges, if entity e ₁ 、e ₂ Of the entity type

And entity e ₁ 、e ₂ Having a relation r, then there is a constraint

Each constraint

Correspond to

And

two sides.

And 2, step: for the sentence packet obtained in the step 1, the word embedding vector of each word in the sentence can be obtained by a word2vec tool in the sentence packet, and the feature vector representation of the sentence can be obtained by taking a segmented convolutional neural network (PCNN) as a sentence encoder.

In particular, for in-package sentences

n _s For the length of the sentence s, each word w _i The input of e s consists of its own word embedding vector and location feature vector, where the word embedding vector v _i Can be obtained by word2vec tool pre-training, and the vector dimension is d _w (ii) a Location feature vector

Is the word w _i With the target entity in the sentence (e) _h ,e _t ) Is embedded in a vector of two relative distances, the vector dimension being d _p . And the relative distance takes the position of the entity pair appearing for the first time in the sentence as a reference position to calculate the relative distance of other words.

Word w is obtained by concatenation _i Is input to represent w _i ，

d＝d _w +2d _p ，

A vector is represented. Input representation w _i As shown in formula 1:

wherein, "; "denotes a vector stitching operation. The input of the sentence s is represented as a matrix

And (3) encoding an input representation matrix X of the sentence s by using a segmented convolutional neural network (PCNN) to obtain a sentence feature vector with a fixed dimension.

Wherein the segmented convolutional neural network packetIncluding convolutional layers and segmented max-pooling layers. Wherein the parameter matrix W of the convolutional layer is represented as:

w represents the length of the convolution sliding window, sub-matrix q of matrix X under the mth sliding window _m As shown in equation 2:

q _m ＝X _m-w+1:m (1≤m≤l _s +w-1) (2)

wherein l _s The length of a sentence s is represented, and m-w +1 represents an index interval of the word sequence in all the word sequences of the original sentence under the sliding window.

Then the sub-matrix q _m Parameter matrix c with convolution kernel _m As shown in equation 3:

wherein, the first and the second end of the pipe are connected with each other,

representing a convolution operation.

Specifically, in the convolution process, sliding convolution is carried out with the stride of 1, zero vector filling is carried out on the part of a convolution window exceeding the sentence boundary, finally, a characteristic vector c representing a matrix X is obtained,

in practical use, in order to capture different characteristics of a sentence, a plurality of convolution checks are usually used to perform convolution on a sentence expression matrix, and d is adopted in the invention _c A set of convolution kernels represented as

After convolution calculation, the expression matrix X corresponds to d _c A feature vector

The segmented convolutional neural network segments the maximum pooling layer, takes the position of an entity pair in a sentence s as a segmentation point, can segment the feature vector into 3 parts, and then applies maximum pooling operation to each part respectively. Specifically, for any one feature vector

After segmentation, 3 eigenvectors are generated: { c _i,1 ；c _i,2 ；c _i,3 }. Performing maximum pooling on each feature sub-vector to obtain a pooled feature vector f with a dimensionality of 3 _i As shown in formula 4:

f _i ＝[max(c _i,1 )；max(c _i,2 )；max(c _i,3 )] (4)

where max (·) represents the max operation.

Matrix X corresponds to d _c A feature vector

Respectively after the segmented maximum pooling layers, splicing the obtained pooled feature vectors, and obtaining the feature vector representation of X after an activation function tanh (-) is carried out

As shown in equation 5:

wherein the content of the first and second substances,

denotes d _c Pooling feature vectors. Thus obtained

I.e. a feature vector representation of the sentence s.

And step 3: and (2) collecting entity name, entity alias, entity type and entity description four entity attribute information in a knowledge base for each entity in the entity set in the step 1 by using BERT as an attribute encoder. And each entity inputs the attribute information into BERT by splicing, then outputs a matrix, and obtains the attribute vector of the corresponding entity by adopting column vector averaging.

Specifically, for each entity e in the entity set _i E, the obtained corresponding attribute vector

d _a Representing the dimensions of the entity attribute vector. Entity e _i Attribute vector of

The calculation method is as follows:

wherein Mean (-) represents the column averaging operation; BERT (-) represents an output matrix calculated using a BERT pre-training model; name _i Representing an entity e _i The entity name of (1); alias _i Representing an entity e _i The entity of (2) is named differently; type _i Representing an entity e _i Entity types in the knowledge base; description of _i Representing an entity e _i The entity description of (1); with symbols [ SEP ] between attribute information]Spacing; [ CLS]A start identifier to be added for BERT input.

And 4, step 4: and (3) constructing an adjacent matrix by using the entity neighbor graph in the step (1), and obtaining the knowledge context vector representation of the target entity by using the adjacent matrix and the entity attribute vector obtained in the step (3) as input through a neighbor graph encoder constructed by a graph convolution neural network.

The target entity may be any entity in the entity set in step 1. For the entity neighbor graph K = { E, N }, its set of entity nodes is E, set of edges is N, its adjacency matrix is

Is calculated as shown in equation 7:

wherein | E | represents the number of physical nodes; v. of _i 、v _j E represents any two entity nodes in the entity node set; (v) of _i ,v _j ) E N indicates that there is a v in the entity neighbor graph _i 、v _j Edges between physical nodes.

Selecting a two-layer graph convolution neural network as a neighbor graph encoder, and carrying out the entity attribute vector obtained in the step 3

As entity e in entity neighbor graph _i E initial vector of E, for node v _i Output of E at kth graph convolution layer

The calculation method is shown in formula 8:

wherein Relu (-) is a nonlinear activation function, | E | represents the number of physical nodes, W ^(k) Weight matrix representing the k-th layer of the neighbor graph encoder, b ^(k) The bias term for the k-th layer is represented,

a feature vector representing the k-1 level output of the jth physical node,

selecting the output of the last layer of the neighbor graph encoder as an entity e _i E knowledge contextRepresenting a vector

d _n Representing the output vector dimension of the last layer of the neighbor graph encoder.

For an entity node set E of an entity neighbor graph K, a knowledge context matrix is obtained after passing through a neighbor graph encoder

Each row vector in the matrix K corresponds to knowledge context vector representation of an entity, and the knowledge context vector representation fuses entity neighbors and attribute information thereof.

And 5: and (2) constructing an adjacency matrix by using the constraint graph in the step (1), and using the adjacency matrix, the entity type and the vector representation of the relationship as input to a constraint graph encoder constructed by a graph convolution neural network to obtain the vector representation of the entity type and the relationship.

Specifically, for a constraint graph G = { T, R, C }, the nodes in the graph include entity type nodes and relationship nodes, and the set of nodes V of the constraint graph ^G = T ≡ R. Define the edge set of the constraint graph G as D ^G For each constraint

D ^G Middle correspond to

And

two sides. Adjacency matrix of constraint graph G

The calculation formula of (a) is shown in formula 9:

wherein, | V ^G I represents the number of nodes of the constraint graph; v. of _i 、v _j ∈V ^G Any two nodes in the constraint graph node set are selected; (v) _i ,v _j )∈D ^G Indicating the presence of a v in the constraint graph _i 、v _j Edges between nodes.

Selecting a two-layer graph convolutional neural network as a constraint graph encoder, and collecting nodes V of the constraint graph ^G Performing a random initialization operation on the initial input vector of each node in the set, for the node v _i ∈V ^G The output of the k-th graph convolution layer

The calculation method is shown in formula 10:

wherein Relu (. Cndot.) is a nonlinear activation function, | V ^G I represents the number of nodes of the constraint graph, M ^(k) Weight matrix, q, representing the k layer of the constraint graph encoder ^(k) The bias term for the k-th layer is represented,

representing the feature vector of the k-1 level output of the jth node.

Selecting the output of the last layer of the constraint graph encoder as a node v _i ∈V ^G Is represented by the feature vector of (1). Set of nodes V for constraint graph G ^G Obtaining a feature matrix after constraint graph encoder

d _g Representing the output vector dimensions of the last layer of the constraint graph coder. Feature matrix V ^G Each row vector in (b) corresponds to a node set V of the constraint graph G ^G Vector representation of one node.

Set of nodes V ^G Including entity type nodes and relationship nodes. By aligning the feature matrix V ^G Splitting is carried out to obtain a characteristic matrix of the entity type node set T

And feature matrix of the set of relational nodes R

|n _t I represents the number of entity types in the entity type set T, and n _r And | represents the number of relationships in the set of relationship nodes R.

And 6: and (4) expressing the sentence characteristic vector obtained in the step (2), expressing the entity context vector obtained in the step (4) and expressing the entity type and relationship vector obtained in the step (5) as input, and calculating to obtain the characteristic vector expression of the sentence packet through a multi-source fusion attention mechanism.

In particular, for sentence packets

n _b The sentence number of the sentence packet B corresponds to the target entity pair of (e) _h ,e _t ) And the corresponding relation label is R belongs to R.

For each sentence s in sentence packet B _i E.g. B, and obtaining a characteristic vector s of the sentence by a sentence coder _i (ii) a For the target entity pair (e) corresponding to the sentence packet _h ,e _t ) The entity e is obtained by an entity attribute encoder and a neighbor graph encoder _h And e _t Corresponding knowledge context vector representation

Target entity pair for sentence packet (e) _h ,e _t ) And a relation label r, obtaining a corresponding entity type vector through a constraint graph encoder

And a relationship vector R ∈ R'.

Multi-source fusion attention machineAnd adding the knowledge context vector and the entity type vector of the entity into the feature vectors of the sentences and the relations, thereby providing more knowledge background information and constraint information for relation extraction. In particular, the sentence s _i Final vector representation of e B

By concatenating sentence feature vectors s _i Entity type vector of target entity pair

And knowledge context vector of target entity pair

Obtained as shown in formula 11:

wherein the content of the first and second substances,

d _s ＝3d _c +2d _g +2d _n vector s _i Has a dimension of 3d _c Vector of motion

Has a dimension of d _g Vector of motion

Has a dimension of d _n (ii) a "; "denotes a vector stitching operation.

The final vector representation R of the relation R ∈ R ^f The calculation method is as follows:

where Linear (·) denotes a Linear fully-connected layer, the purpose of which is to align the vector r ^f Sum vector

The vector dimension of (2).

"; "denotes a vector stitching operation.

In obtaining a sentence s _i Final vector representation of e B

And a final vector representation r of the relationship label r ^f Thereafter, the feature vector representation of sentence packet B is obtained through the attention mechanism

The calculation is shown in equation 13:

wherein n is _b Number of sentences, alpha, for sentence packet B _i Representing a sentence s _i Final vector representation of e B

Is calculated as shown in equation 14:

wherein e is _i Representing sentence vectors

The matching score with the relation label r of the sentence packet is calculated as shown in the formula 15:

wherein "·" denotes a vector dot product operation.

And 7: and (4) expressing the feature vector of the sentence packet obtained in the step (6), and predicting the relation label of the sentence packet through a relation classifier.

After the feature vector B of the packet B is obtained through a multi-source fusion attention mechanism, the relational classifier predicts the relational label of the sentence packet B. Sentence packet B with each relationship R in the set of relationships R _i The calculation formula of the prediction score of (2) is shown in equation 17:

wherein the content of the first and second substances,

represents the relation r _i Is represented by the final vector of (a),

representing a bias term.

When each relation R in the sentence sub-packet B and the relation set R is obtained _i After the prediction score of (2), the packet B is classified into the relation r by calculating the softmax function _i Is calculated as shown in equation 17:

wherein, | n _r L represents the number of relationships in the set of relationships R,

indicating the jth relation R between the sentence packet B and the relation set R _j Theta is a parameter of the relational extraction model.

Further, the relationship extraction model in the invention is trained by using a cross entropy loss function as a loss function, and a calculation formula is shown in formula 18:

wherein n is _B Indicating the number of sentence bundles in the training set used,

sentence display packet B ⁱ θ is a parameter of the relationship extraction model used in the present invention, and includes all the parameters in the above steps: parameters of the sentence encoder in step 2, parameters of the attribute encoder in step 3, parameters of the entity neighbor graph encoder in step 4, parameters of the constraint graph encoder in step 5, parameters of the multi-source fusion attention mechanism in step 6, and parameters of the relationship classifier in step 7. The parameters include a weight matrix and bias terms.

In the invention, the relation extraction model used by the relation extraction method can use a small batch of random gradient descent optimization algorithm SGD minimum loss function J (theta), thereby better optimizing parameters.

Claims

1. A remote supervision relation extraction method fusing knowledge and constraint graphs is characterized by comprising the following steps:

step 1: collecting neighbor entities of entities in the remote supervision data set in a knowledge base, wherein the neighbor entities comprise one-hop neighbor entities and two-hop neighbor entities; an entity set is formed by entities in a remote supervision data set and neighbor entities thereof, an entity neighbor graph is constructed by using the entity set, and a constraint graph is constructed by combining a relationship set between the entities; for the remote supervision data set, combining sentences with the same entity pairs into a sentence sub-packet;

step 2: acquiring a word embedding vector of each word in a sentence in the packet and a feature vector representation of the sentence;

and 3, step 3: collecting entity attribute information of each entity in the entity set in a knowledge base by using an attribute encoder, wherein the entity attribute information comprises an entity name, an entity type and an entity description; each entity obtains the attribute vector of the corresponding entity by splicing the attribute information and inputting the attribute information into an attribute encoder, then outputting a matrix and carrying out column vector equalization;

and 4, step 4: constructing an adjacent matrix by using an entity neighbor graph, and obtaining knowledge context vector representation of a target entity by using the adjacent matrix and an entity attribute vector as input and a neighbor graph encoder constructed by a graph convolution neural network;

and 5: constructing an adjacency matrix by using a constraint graph, and obtaining vector representations of entity types and relationships by using the adjacency matrix, entity types and vector representations of relationships as input through a constraint graph encoder constructed by a graph convolution neural network;

step 6: taking sentence characteristic vector representation, entity context vector representation, entity type and relation vector representation as input, and calculating to obtain characteristic vector representation of a sentence packet through a multi-source fusion attention mechanism;

2. The method for extracting remote supervision relationship based on fusion knowledge and constraint graph as claimed in claim 1, wherein in step 1, the knowledge base contains the relationship between entity pair and entity pair, and the attribute information of each entity: entity name, entity alias, entity type, and entity description;

the remote supervision data set is a training corpus marked by a remote supervision method, the entity pairs and the corresponding relations in the knowledge base are used for marking natural language texts, a triple of < head entity, tail entity and relation >' exists in the knowledge base, and any sentence containing the head entity and the tail entity is determined to express the triple relation, so that marked data are obtained.

3. The method as claimed in claim 1, wherein in step 2, the word embedding vector of each word in the sentence is obtained from the sentence in the package through a word2vec tool, and the feature vector representation of the sentence is obtained by using a segmented convolutional neural network as a sentence encoder, wherein the segmented convolutional neural network is a neural network model which takes the feature vector sequence of the words in the sentence as input and generates the sentence feature vector representation by convolution and segmentation pooling based on the positions of two entities in the sentence.

4. A method for extracting remote supervised relationship fusing knowledge and constraint graphs as recited in claim 1, wherein in step 1, an entity neighbor graph is defined as a graph K = { E, N }, where E represents a set of entity nodes, that is, a set of entities; n represents a set of edges; if two entities E in the set E ₁ 、e ₂ While appearing in a triple in the knowledge base, there is an edge (e) ₁ ,e ₂ )∈N；

Defining a constraint graph as a graph G = { T, R, C }, wherein T is an entity type node set, and identifying the type of an entity in a data set by using a Flair named entity identification tool;

let R be the set of relationship nodes formed by all relationships and C be the set of constraint edges, if entity e ₁ 、e ₂ The entity type of

And entity e ₁ 、e ₂ Having the relationship r, then there is a constraint

Each constraint

Correspond to

And

two edges;

in step 2, for the sentence packet obtained in step 1, for the sentences in the packet

n _s For the length of sentence s, each word w _i The input of e s consists of its own word embedding vector and location feature vector, where the word embedding vector v _i Let the vector dimension be d _w (ii) a Location feature vector

Is the word w _i With target entity pair in sentence (e) _h ,e _t ) Is embedded in a vector representation of two relative distances, the vector dimension being d _p (ii) a The relative distance takes the position of the entity pair appearing for the first time in the sentence as a reference position to calculate the relative distance of other words;

word w is obtained by concatenation _i Is input to represent w _i ，

d＝d _w +2d _p ，

Representing a vector; input representation w _i As shown in formula 1:

wherein, "; "represents a vector stitching operation; the input of the sentence s is represented as a matrix

Using a segmented convolutional neural network to encode an input representation matrix X of a sentence s to obtain a sentence characteristic vector with a fixed dimension; the segmented convolutional neural network comprises a convolutional layer and a segmented maximum pooling layer; wherein the parameter matrix W of the convolutional layer is represented as:

w denotes the length of the convolution sliding window, sub-matrix q of matrix X under the mth sliding window _m As shown in equation 2:

q _m ＝X _m-w+1:m (1≤m≤l _s +w-1) (2)

wherein l _s The length of a sentence s is represented, and m-w +1 represents an index interval of all word sequences of the original sentence in the word sequence under the sliding window;

then the submatrix q _m Parameter matrix c with convolution kernel _m Is shown in formula 3:

wherein the content of the first and second substances,

representing a convolution operation;

in the convolution process, sliding convolution is carried out with the stride of 1, the part of a convolution window exceeding the sentence boundary is filled with zero vectors, finally a characteristic vector c representing a matrix X is obtained,

by d _c A set of convolution kernels represented as

Segmenting a maximal pooling layer of a segmented convolutional neural network, taking the position of an entity pair in a sentence s as a segmentation point, segmenting the feature vector into 3 parts, and then respectively applying maximal pooling operation to each part; for any one feature vector

After segmentation, 3 eigenvectors are generated: { c _i,1 ；c _i,2 ；c _i,3 }; performing maximum pooling on each feature sub-vector to obtain a pooled feature vector f with a dimensionality of 3 _i As shown in formula 4:

f _i ＝[max(c _i,1 )；max(c _i,2 )；max(c _i,3 )] (4)

where max (·) denotes a max operation;

matrix X corresponds to d _c A feature vector

As shown in equation 5:

wherein the content of the first and second substances,

denotes d _c Pooling feature vectors; thus obtained

Namely representing the feature vector of the sentence s;

in step 3, using BERT as an attribute encoder, for each entity in the entity set in step 1, collecting four entity attribute information of entity name, entity alias, entity type and entity description in a knowledge base; each entity inputs the attribute information into BERT by splicing, then outputs a matrix, and obtains the attribute vector of the corresponding entity by adopting column vector averaging;

for each entity e in the entity set _i E, the obtained corresponding attribute vector

d _a A dimension representing an entity attribute vector; entity e _i Attribute vector of

The calculation method is as follows:

wherein Mean (-) represents the column averaging operation; BERT (-) represents an output matrix calculated by using a BERT pre-training model; name _i Representing an entity e _i The entity name of (1); alias (Alias) _i Representing an entity e _i The entity of (1) is named separately; type _i Representing an entity e _i An entity type in the knowledge base; description of _i Representing an entity e _i The entity description of (1); by symbol [ SEP ] between attribute information]Spacing; [ CLS]The initial identifier is added when the BERT is input;

in step 4, constructing an adjacent matrix by using the entity neighbor graph in the step 1, using the adjacent matrix and the entity attribute vector obtained in the step 3 as input, and obtaining knowledge context vector representation of a target entity by using a neighbor graph encoder constructed by a graph convolution neural network;

wherein, the target entity is any entity in the entity set in the step 1; for the entity neighbor graph K = { E, N }, its set of entity nodes is E, and the set of edges isN, its adjacent matrix

Is calculated as shown in equation 7:

wherein | E | represents the number of physical nodes; v. of _i 、v _j E represents any two entity nodes in the entity node set; (v) _i ,v _j ) E N indicates that there is a v in the entity neighbor graph _i 、v _j Edges between the physical nodes;

selecting a two-layer graph convolutional neural network as a neighbor graph encoder, and using the entity attribute vector obtained in the step 3

As entity e in an entity neighbor graph _i E initial vector of E, for node v _i E is the output of the k-th graph convolution layer

The calculation method is shown in formula 8:

wherein Relu (-) is a nonlinear activation function, | E | represents the number of entity nodes, W ^(k) Weight matrix representing the k-th layer of the neighbor graph encoder, b ^(k) A bias term representing the k-th layer,

a feature vector representing the k-1 layer output of the j-th physical node,

selecting the output of the last layer of the neighbor graph encoder as an entity e _i Knowledge context representation vector of E

d _n Representing the output vector dimension of the last layer of the neighbor graph encoder;

for an entity node set E of an entity neighbor graph K, a knowledge context matrix is obtained through a neighbor graph encoder

Each row vector in the matrix K corresponds to knowledge context vector representation of an entity, and the knowledge context vector representation integrates entity neighbors and attribute information thereof;

in step 5, constructing an adjacency matrix by using the constraint graph in the step 1, and obtaining vector representations of entity types and relationships by using the adjacency matrix, the vector representations of entity types and relationships as input through a constraint graph encoder constructed by a graph convolution neural network;

for a constraint graph G = { T, R, C }, nodes in the graph comprise entity type nodes and relationship nodes, and a node set V of the constraint graph ^G = T ═ u ≡ R; define the edge set of the constraint graph G as D ^G For each constraint

D ^G Middle correspondence

And

two edges; adjacency matrix of constraint graph G

The calculation formula of (a) is shown in formula 9:

wherein, | V ^G I represents the number of nodes of the constraint graph; v. of _i 、v _j ∈V ^G Any two nodes in the constraint graph node set are selected; (v) _i ,v _j )∈D ^G Indicating the existence of a v in the constraint graph _i 、v _j Edges between nodes;

The calculation method is shown in formula 10:

a feature vector representing the k-1 layer output of the jth node;

selecting the output of the last layer of the constraint graph encoder as a node v _i ∈V ^G A feature vector representation of; set of nodes V for constraint graph G ^G Obtaining a value after passing through the constraint map encoderIndividual feature matrix

d _g Representing the output vector dimension of the last layer of the constraint graph encoder; feature matrix V ^G Each row vector in (b) corresponds to a node set V of the constraint graph G ^G A vector representation of the one node;

set of nodes V ^G The method comprises entity type nodes and relationship nodes; by aligning feature matrix V ^G Splitting to obtain a characteristic matrix of the entity type node set T

And feature matrix of the set of relational nodes R

|n _t I represents the number of entity types in the entity type set T, and n _r L represents the number of relationships in the relationship node set R;

in step 6, the sentence feature vector obtained in step 2, the entity context vector obtained in step 4, and the vector representation of the entity type and the relationship obtained in step 5 are used as input, and feature vector representation of the sentence sub-packet is obtained through calculation by a multi-source fusion attention mechanism;

for sentence packets

n _b The sentence number of the sentence packet B corresponds to the target entity pair of (e) _h ,e _t ) The corresponding relation label is R epsilon R;

And the relation vector R belongs to R';

the multi-source fusion attention mechanism is characterized in that a knowledge context vector and an entity type vector of an entity are added into a sentence and a feature vector of a relation, so that more knowledge background information and constraint information are provided for relation extraction; sentence s _i Final vector representation of e B

And knowledge context vector of target entity pair

Obtained as shown in formula 11:

wherein the content of the first and second substances,

d _s ＝3d _c +2d _g +2d _n vector s _i Dimension of 3d _c Vector of motion

Has a dimension of d _g Vector of motion

Has a dimension of d _n (ii) a "; "represents a vector splicing operation;

The vector dimension of (a); "; "represents a vector stitching operation;

in obtaining a sentence s _i Final vector representation of e B

And the final vector representation r of the relationship label r ^f Thereafter, the feature vector representation of sentence packet B is obtained through the attention mechanism

The calculation is shown in equation 13:

Is calculated as shown in equation 14:

wherein e is _i Representing sentence vectors

wherein, "·" represents a vector dot product operation;

in step 7, the relation labels of the sentence packets are predicted through a relation classifier according to the feature vector representation of the sentence packets obtained in step 6;

after a feature vector B of a packet B is obtained through a multi-source fusion attention mechanism, a relation classifier predicts a relation label of the sentence packet B; sentence packet B with each relationship R in the set of relationships R _i The calculation formula of the prediction score of (2) is shown in equation 17:

wherein the content of the first and second substances,

represents the relation r _i Is represented by the final vector of (a),

representing a bias term;

when each relation R in the sentence sub-packet B and the relation set R is obtained _i After predicting the score, the packet is calculated by the softmax functionB is classified as a relation r _i The probability P of (c) is calculated as shown in equation 17:

5. The method for extracting remote supervision relationship based on the fusion knowledge and constraint graph as claimed in claim 4, wherein the relationship extraction model is trained by using the cross entropy loss function as the loss function, and the calculation formula is shown in equation 18:

wherein n is _B Indicating the number of sentence packets in the training set used,

sentence display packet B ⁱ θ is a parameter of the relationship extraction model used in the present invention, and includes all parameters in the above steps: parameters of the sentence encoder in step 2, parameters of the attribute encoder in step 3, parameters of the entity neighbor graph encoder in step 4, parameters of the constraint graph encoder in step 5, parameters of the multi-source fusion attention mechanism in step 6, and parameters of the relationship classifier in step 7; the parameters include a weight matrix and bias terms.