CN114021584B

CN114021584B - Knowledge representation learning method based on graph convolution network and translation model

Info

Publication number: CN114021584B
Application number: CN202111240396.3A
Authority: CN
Inventors: 周惠巍; 李雪菲; 徐奕斌; 姜海斌
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2024-05-10
Anticipated expiration: 2041-10-25
Also published as: CN114021584A

Abstract

A knowledge representation learning method based on a graph-convolution network and a translation model firstly adopts the translation model to learn entity and relation representations in a knowledge base based on the knowledge base. And then, taking the knowledge base as a guide, and adopting remote supervision to obtain the entity of the biomedical text and the relation label thereof. And then learn the entity representation in the text using GCGCN. Finally, aligning the entity representations in the knowledge base and the text so that the entity representations based on knowledge base and remote supervised text learning coexist in the same vector space. The invention effectively fuses the knowledge base and the large-scale remote supervision text information based on the translation model and the graph convolution network, realizes multi-source information fusion, acquires high-quality knowledge representation, and improves the performance of the biomedical relation extraction model. And learning the structured knowledge in the knowledge base based on the translation model, learning the context knowledge in the large-scale remote supervision text based on the graph convolution network, and finally obtaining high-quality knowledge representation through entity alignment fusion of the multi-source knowledge.

Description

Knowledge representation learning method based on graph convolution network and translation model

Technical Field

Based on a graph convolution network (Graph Convolutional Networks, GCN) and a translation model, the invention fuses the triples in the knowledge graph and the context in the large-scale remote supervision text to perform knowledge representation learning. First, a knowledge representation is learned using a knowledge base triplet based on a translation model. And then adopting a graph convolution network to learn the entities in the large-scale biomedical text obtained by remote supervision learning. And finally, aligning the knowledge base with the entities in the biomedical text to realize entity fusion based on the knowledge base and large-scale remote supervision text information. The invention is mainly used for biomedical relation extraction tasks in the field of natural language processing.

Background

With the rapid development of computer technology and biotechnology, the literature in the biomedical field is growing exponentially. Researchers are eagerly looking to reveal biomedical knowledge contained in massive biomedical documents, promote biomedical development and improve life quality of people. This demand has driven the generation and development of biomedical information extraction technologies.

The vast biomedical literature contains abundant and valuable knowledge. Meanwhile, researchers in the biomedical field spend a great deal of effort to study and construct large-scale and high-quality biomedical knowledge bases. The biomedical knowledge base provides powerful entity semantics and entity relation knowledge resource support for biomedical information extraction, and is an important knowledge resource for promoting intelligent medical development. In recent years, a technique of learning representations of entities and relationships in a knowledge base has received a great deal of attention.

In the prior knowledge representation learning method based on the knowledge base, learning entities and relation representations are simply based on the knowledge base by utilizing a translation model and the like. Knowledge representation learning based solely on knowledge bases lacks the entity and relationship information that is contained in large-scale biomedical text.

Thus, researchers merge the knowledge base and text information to improve knowledge representation capability, i.e. learn entity and relation representations in the knowledge base by using a translation model and the like, and learn sentence representations describing the relation of the two entities by using a convolutional neural network. And finally, aligning the knowledge base with the text entities and the relation representation thereof, and realizing knowledge representation based on knowledge base and text information fusion.

However, the expression of entity relationships in biomedical texts is complex, including intra-sentence entity relationships and inter-sentence entity relationships. Therefore, in knowledge representation learning that merges knowledge base and text information, it is necessary to consider the entity relationship of document-level text.

Moreover, the entity relationship labeling corpus in the biomedical field is lacking, and in order to obtain the large-scale entity relationship labeling corpus, remote supervision is generally adopted to label the unlabeled corpus in the large-scale biomedical field. However, the model cannot determine which sentence in the sentence set (bag) corresponding to the relationship instance is related to the relationship, and a sentence which does not express a certain relationship may be regarded as a sentence which expresses the relationship or a sentence which expresses a certain relationship may be regarded as a sentence which does not express the relationship in modeling. To avoid introducing noise data, a attentiveness mechanism is employed for a pair of entities, learning to obtain a weight for each sentence in the document. And then carrying out weighted summation on all sentences to obtain document representation of the entity pair. This approach fails to learn the entity, entity relationship, and semantic representation of the sentence as a whole for all sentences and entities in a document.

In recent years, researchers apply graph convolution networks to document-level relation extraction tasks, and good entity relation extraction performance is achieved. Therefore, it is necessary to explore how to use the graph packing network to mine semantic information of document-level entities, entity relationships and sentences, and integrate the entities and the relationship information thereof in the biomedical knowledge base at the same time, so as to realize high-quality knowledge representation based on knowledge base and text information fusion.

Disclosure of Invention

In view of the problems of the existing method, the invention provides a method (GCGCN-TransE) for combining a graph rolling network and a translation model to learn knowledge representation and obtain knowledge representation based on knowledge base and text information fusion.

First, based on the knowledge base, the entity and relationship representations in the knowledge base are learned using a translation model. And then, taking the knowledge base as a guide, and adopting remote supervision to obtain the entity of the biomedical text and the relation label thereof. Next, GCGCN (Zhou et al ,Global Context-enhanced Graph Convolutional Networks for Document-level Relation Extraction//COLING2020)) is employed to learn the entity representations in the text.

The invention can integrate information about entities in a knowledge base and a large-scale text in biomedical knowledge representation learning, realize knowledge representation learning based on multi-source information and improve knowledge representation capability.

The technical scheme of the invention is as follows:

knowledge representation learning based on graph convolution network and translation model comprises the following steps:

Step one: biomedical text entity relationship annotation based on remote supervision

Automatically identifying biomedical entities in the large-scale unlabeled corpus by adopting an entity identifier; and labeling entity relations in the large-scale unlabeled corpus by taking the biomedical knowledge base as a guide and adopting remote supervision.

Step two: feature sequence construction

Word vectors are encoded using BioBERT pre-trained language models.

Step three: knowledge representation for learning knowledge base based on translation model

And learning entity and relation representations in the biomedical knowledge base triples (h, r, t) by adopting a translation model.

Step four: GCGCN-based learning of knowledge representation of large-scale remote supervision corpus

The multi-layer graph convolution can solve a great number of cross-sentence multi-hop reasoning problems in the document level relation extraction. To collect rich global information, the node and edge representations are learned using a multi-layer graph rolling operation.

Step five: entity fusion based on knowledge base and biomedical text information

And aligning entity representations in the knowledge base and the text, and realizing knowledge representation of multi-source heterogeneous information fusion. The knowledge base and the learned entity representations in the text are made to coexist in the same vector space.

The invention has the beneficial effects that: the invention effectively fuses the knowledge base and the large-scale remote supervision text information based on the translation model and the graph convolution network, realizes multi-source information fusion, acquires high-quality knowledge representation, and improves the performance of the biomedical relation extraction model. And learning the structured knowledge in the knowledge base based on the translation model, learning the context knowledge in the large-scale remote supervision text based on the graph convolution network, and finally obtaining high-quality knowledge representation through entity alignment fusion of the multi-source knowledge.

Drawings

Fig. 1 is a basic flow diagram of a system.

FIG. 2 is an example document level entity interaction graph construction.

FIG. 3 is an example of entity alignment in a knowledge base and text.

In fig. 3:

Detailed Description

The knowledge base of the invention adopts a comparative toxicological genomics database (Comparative Toxicogenomics Database, CTD), which is a knowledge base containing knowledge of the drug-gene relationship, the drug-disease relationship, the gene-disease relationship and the like. The CTD knowledge base is adopted in the experiment to obtain the relationship between the diseases and the drug entities in the large-scale unlabeled corpus, and the relationship between the diseases induced by the drugs is studied in an important way.

The specific steps of the invention are further described below with reference to fig. 1 and the technical scheme:

Step one: labeling all drug entities and disease entities in the PubMed abstract and corresponding MeSH IDs thereof by using a text mining tool PubTator(Wei C H,Kao H Y,Lu Z.PubTator:aweb-based text mining tool for assisting biocuration[J].Nucleic acids research,2013,41(W1):W518-W522.); the entity relationship in the large-scale unlabeled corpus is labeled by adopting remote supervision with a comparative toxicological genomics database (Comparative Toxicogenomics Database, CTD) as a guide. For all entity pairs in a document, if a certain pair of entities has a certain relation in a knowledge base, the pair of entities in the document are considered to have the relation, and the relation of the pair of entities is marked.

Step two: the word vector is encoded by utilizing BioBERT pre-training language model, the input text is required to be processed into BioBERT input form, namely, a special identifier [ CLS ] is added at the head end of the text, a special separator [ SEP ] is added at the tail end of each sentence, and word segmentation processing is carried out on the input sequence. Finally, learning the segmented input sequence through BioBERT pre-training language model, extracting hidden layer representation output by the last layer of network as word vector, wherein the word vector of the ith segmented word is

Constructing an entity type matrix E _type and a co-index matrix E _corf by a random initialization method, and performing label mapping on each word in a text sequence to obtain a corresponding type feature vectorAnd co-index feature vectorWherein t _i and c _i are category labels and co-index labels of the ith word.

The obtained word vector, the type feature vector and the common-finger feature vector are spliced to construct the features finally input to the context semantic encoder, and the formula is as follows:

Wherein "; "is a vector concatenation operation, the dimension of the final feature is d=d _w+d_t+d_c.

Knowledge representation of the knowledge base triplet (h, r, t) is learned using a translation model. e _h、e_t、e_r are representations of head, tail entities and relationships, respectively. The translation model defines an energy function d (·) that can measure how well a set relationship between an entity and a relational representation is satisfied, and a loss function L _k is represented as:

Wherein gamma > 0 is the boundary, S is the triplet set of the knowledge base, and S' is the negative example set of the entity relationship. Learning obtains a representation e _h、e_t、e_r of head, tail entities and relationships based on the knowledge base.

Input data for building a graph structure is input for each document level input sample. Since each input sample labels a collection of entitiesWhere N is the number of entities. An entity interaction graph is constructed by the following two rules: each entity in the entity set is a node in the graph, namely the number of nodes of the graph is N; if the mention of two entities occurs in one sentence, then the nodes representing the two entities in the graph are connected using undirected edges.

The constructed entity interaction graph is marked as G (a, E), wherein a is an adjacency matrix, if there is an edge connection between node i and node j in the graph, a _ij =1, otherwise a _ij =0.

GCGCN comprises four layers: an embedding layer, a Context aware attention directed graph convolution (Context-aware Attention Guided Graph Convolution, CAGGC) module, a Multi-headed attention directed graph convolution (Multi-head Attention Guided Graph Convolution, MAGGC) module, and a relationship classification layer.

Embedded layer

Word vectors are encoded using BioBERT pre-trained language models. Given a documentThe coded word vector sequence is:

Is the word vector of the jth word in the ith sentence, and d _w is the vector dimension.

Splice word vector q _i,j and entity type vectorAnd co-pointing amountObtaining a final word vector sequence/>As shown in the formula:

Since an entity may contain multiple references, which may contain multiple words, an average operation is used to calculate the entity representation, and the calculated entity representation is denoted as P ⁽⁰⁾, the entity calculation process is as follows:

Wherein, Is a representation of entity e _v, J is the number of its references, m _q is the q-th reference to e _v, s and t are their starting and ending positions.

Context-aware attention-directed graph convolution (Context-aware Attention Guided Graph Convolution, CAGGC) module

An attention mechanism and a gating mechanism are utilized to calculate an entity-aware edge representation that contains rich context information. The generation of the weighted adjacency matrix is then guided with the calculated representation of the edges, and finally the node representation is updated on the plurality of densely connected graph convolution sublayers.

Because an edge may be associated with multiple context sentences, to compute the representation of an edge between node u and node v, a word-level attention mechanism is first utilized to obtain a representation of a single sentence, and then a gating mechanism is utilized to fuse the information of multiple sentences to obtain an entity-aware edge representation.

The representation h _i of the ith sentence on the edge uv is first calculated using the word vector for each word and its relative distance to the given entity as shown in the formula:

Where c e { u, v } represents either of two entities, Is the relative distance vector of the current word and entity c,/>Is the attention weight of the jth word in the ith sentence perceived by entity c, m is the number of words in the ith sentence, and W ₁、W₂, z and b ₁ are trainable parameters.

Word level attention calculation is carried out on the ith sentence by utilizing the entities u and v respectively to obtain two sentence representationsAnd/>The two sentence representations are spliced and input into a fully connected layer to obtain sentence representations that simultaneously sense entities u and v, as shown in the formula:

in order for the model to take into account the information of all sentences on the edge uv, a physical aware gating mechanism is employed. For entity c e { u, v }, its initial representation is employed The weights of the individual sentences are calculated and all sentences are weighted and summed as an edge representation. The calculation process is as follows:

Where σ (·) is the sigmoid or ReLU activation function, W ₃、W₄、W₅ and b ₂ are both trainable parameters, and S represents the total number of sentences.

Representing edges perceived by the calculated entities u and v respectivelyAnd/>Spliced and input into a fully connected layer to obtain edge representations that are simultaneously sensed for entities u and v, as shown in the formulas.

Wherein "; "means a splice operation, W _sg and b _sg are trainable parameters.

Through the calculation process, CAGGC initial edge representation matrix of the network is obtainedThe proposed physically aware door mechanism has two characteristics. First, introducing representations of two entities to calculate a gating value, giving sentences related to the two entities a greater weight; second, the weights of the respective sentences are calculated using the activation function, and the model can effectively control the information flow even if only one sentence is on the side to be calculated.

The adjacency matrix used in the traditional graph-convolution network consists of 0 and 1, which indicate whether edges exist between nodes, and cannot distinguish the correlation of the adjacency node and the current node in finer granularity, so that information propagation between entities cannot be effectively controlled. A method for calculating a weighted adjacency matrix comprehensively considering node information and side information. The weight between nodes u and v is expressed asIt can be calculated from the formula:

Wherein W, W _u、W_v and W _e are trainable parameters, exp represents an exponential function based on e.

The GCGCN model also blends the edge representation into the graph convolution operation, updating the node representation with rich context information. The two hierarchical graph convolution inference modules (CAGGC, MAGGC) of GCGCN each contain K densely connected sublayers, and the calculation result of the node v through the kth sublayer is as follows:

Wherein, And b ^k is a trainable parameter for the kth sublayer.

The output of the initial node representation and the output of the previous k-1 sub-layer are fused by adopting a dense connection method as the input of the current sub-layer, as shown in the formula:

multi-head attention directed graph convolution (MAGGC) module

Interactions between all nodes, in particular nodes connected by multi-hop paths, are collected with multi-headed attention.

As a result of the introduction of the multi-headed attention mechanism MAGGC expands the partially connected graph used in the previous module into a weighted full connected graph. First computing the edge representation, MAGGC module replaces P ⁽⁰⁾ in CAGGC module with P ⁽¹⁾, and calculates the entity-aware edge representation matrix in the same mannerIf entities u and v do not appear in any sentence, then edge/>Is a zero vector.

Unlike the CAGGC module, which considers the impact of the context information, MAGGC directly calculates the adjacency matrix using the self-attention mechanism (self-attentionmechanism), as shown by the formula:

where W _Q and W _K are trainable parameters, d represents the vector dimension.

Since the multi-head attention includes multiple self-attention, t different adjacency matrices are calculated using the above formulaAnd reducing the dimension of the calculated t output representations { P ₁ ⁽²⁾;P₂ ⁽²⁾;...;P_t ⁽²⁾ } and then splicing to obtain an output P ⁽²⁾ of the MAGGC module.

Relationship classification layer

Splicing the initial node representation obtained by the coding layer and the node representations calculated by the two graph convolution reasoning modules, inputting the spliced node representations into a full-connection layer, and obtaining a final node representation by using an activation function, wherein the final node representation is shown in the formula:

P＝tanh(W_p[P⁽⁰⁾;P⁽¹⁾;P⁽²⁾]+b_p)

Where P ⁽⁰⁾ is the initial node representation, P ⁽¹⁾ and P ⁽²⁾ are the node representations output by the CAGGC and MAGGC modules, respectively, and W _p and b _p are trainable parameters.

Splicing entity representation and relative distance vectors, and obtaining entity pair relation characteristics by utilizing bilinear functions and full connection layers, wherein the entity pair relation characteristics are used for relation classification, and the calculation process is as follows:

P_u′＝[P_u;E(d_u,v)]

P_v′＝[P_v;E(d_v,u)]

P(r|u,v)＝sigmoid(P_u′^TW_rP_v′+W_t[P_u′;P_v′]+b_r)

Wherein "; "represents a concatenation operation, d _u,v and d _v,u are the first mentioned relative distances of two entities, and E is the mapping matrix of the relative distance vectors.

Because the remote supervision corpus contains various relations, a binary cross entropy loss function for multi-classification tasks is adopted to calculate a loss value in the training process:

Where S represents the entire training set, II (-) is the indicator function, and R is a collection of predefined relationship types.

Aligning the text-based entity representation with the translation model-based entity representation results in an entity alignment penalty L _A, i.e., minimizing:

Where D (P _i,e_j) is the distance of the textual entity representation P _i from the translation model-based entity representation e _i. The matrix M is used to map the entity representation P _i of the text to the space of the entity representation e _i of the translation model:

D(P_i,e_i)＝||MP_i-e_i||

and researching the interrelationships among the knowledge base loss L _K, the text loss L _T and the alignment loss L _A according to the credibility and consistency of the knowledge base and the text information, and obtaining the optimal knowledge representation fusing the biomedical knowledge base and the text information.

Knowledge obtained based on this patent is used to represent the extraction of entity relationships for the biomedical field. Testing was performed directly on BioCreative V CDR test data without training data of BioCreative V CDR. For a pair of candidate entities in a document in the test data, we calculate the cosine similarity between the difference value represented by the head and tail entities and each relation representation, and determine the relation of the entity pair. Referring to the description of the CDR corpus, the drug induced disease relationship (CID) in the CDR corpus refers to the "marker/mechanism" relationship in CTD. The entity pair with the maximum similarity of marker/mechanism relationship is considered to have CID relationship. The experimental results are shown in the following table:

knowledge representation	P(％)	R(％)	F(％)
				TransE (cosine similarity)	47.51	11.63	18.69
GCGCN-TransE (cosine similarity)	51.02	67.82	58.24

Experimental results show that the final F value of the method which is proposed by the inventor and only utilizes entity representation GCGCN-TransE (cosine similarity) is improved by 38.41% compared with that of the traditional TransE (cosine similarity) method, and the knowledge representation learning method GCGCN-TransE which is proposed by the inventor and is based on a graph convolution network and a translation model can effectively capture and fuse knowledge of biomedical knowledge base and remote supervision text information, so that high-quality knowledge representation is obtained.

We further use the knowledge representation for the deep neural network model GCGCN to extract biomedical entity relationships. First, based on GCGCN, training data of BioCreative V CDR is utilized to train and obtain an entity relation extraction model, and the entity relation extraction model is directly tested on BioCreative V CDR test data. Then, the classification layer at GCGCN concatenates the two entity representations learned based on TransE and GCGCN-TransE, respectively, training to obtain two models TransE (neural network) and GCGCN-TransE (neural network). The results on BioCreative V CDR test data are shown in the following table:

System name	P(％)	R(％)	F(％)
				TransE (neural network)	54.79	15.57	24.25
Zhou et al GCGCN	54.95	67.73	60.67
				GCGCN-TransE (neural network)	59.83	64.26	61.96

Experimental results show that the final F value of the GCGCN-TransE method provided by the inventor is improved by 1.29% compared with that of the traditional method of Zhou et al GCGCN, and the biomedical relation extraction system GCGCN-TransE based on knowledge representation learning of a graph rolling network and a translation model provided by the inventor can effectively capture optimal knowledge representation information fusing a biomedical knowledge base and text information, so that a better result is obtained in biomedical relation extraction.

Claims

1. A knowledge representation learning method based on a graph convolution network and a translation model is characterized by comprising the following steps:

Step one: labeling all drug entities and disease entities in the PubMed abstract and corresponding MeSH IDs thereof by using a text mining tool PubTator; the entity relationship in unlabeled corpus of the document is labeled by adopting remote supervision with a comparative toxicological genomics database as a guide; for all entity pairs in a document, if a certain relation exists in a knowledge base for a certain pair of entities, the relation exists for the pair of entities in the document, and the relation of the pair of entities is marked;

Step two: encoding word vectors by utilizing BioBERT pre-training language models, processing an input text into an input form of BioBERT, namely adding a special identifier [ CLS ] at the head end of the text, adding a special separator [ SEP ] at the tail end of each sentence, and performing word segmentation on an input sequence; finally, learning the segmented input sequence through BioBERT pre-training language model, extracting hidden layer representation output by the last layer of network as word vector, wherein the word vector of the ith segmented word is

Constructing an entity type matrix E _type and a co-index matrix E _corf by a random initialization method, and performing label mapping on each word in a text sequence to obtain a corresponding type feature vectorAnd co-index feature vectorWherein t _i and c _i are class tags and co-index tags of the ith word;

wherein "; "is a vector concatenation operation, the dimension of the final feature is d=d _w+d_t+d_c;

Learning knowledge representations of the knowledge base triples (h, r, t) using the translation model; e _h、e_t、e_r is a representation of the head, tail entities and relationships, respectively; the translation model defines an energy function d (·) that can measure how well a set relationship between an entity and a relational representation is satisfied, and a loss function L _k is represented as:

Wherein, gamma > 0 is the boundary, S is the triplet set of the knowledge base, S' is the negative example set of the entity relation; learning to obtain a representation e _h、e_t、e_r of head, tail entities and relationships based on the knowledge base;

Inputting samples at each document level to construct input data of a graph structure; since each input sample labels a collection of entitiesWherein N is the number of entities; an entity interaction graph is constructed by the following two rules: each entity in the entity set is a node in the graph, namely the number of nodes of the graph is N; if the mention of two entities appears in one sentence, then using the undirected edge to connect the nodes representing the two entities in the graph;

The constructed entity interaction graph is marked as G (A, E), wherein A is an adjacency matrix, if an edge connection exists between a node i and a node j in the graph, A _ij =1, otherwise A _ij =0;

GCGCN comprises four layers: the system comprises an embedding layer, a context-aware attention-guided graph rolling module, a multi-head attention-guided graph rolling module and a relationship classification layer;

(1) Embedding layer

Encoding word vectors using BioBERT pre-training language models; given a documentThe coded word vector sequence is:

Wherein, Is the word vector of the jth word in the ith sentence, d _w is the vector dimension;

Splice word vector q _i,j and entity type vector And co-finger vector/>Obtaining a final word vector sequence/>As shown in the formula:

Wherein, Is a representation of entity e _v, J is the number it refers to; m _q is the q-th reference to e _v, s and t are their starting and ending positions;

(2) Context aware attention directed graph convolution module

Calculating an entity-aware edge representation containing rich context information using an attention mechanism and a gating mechanism; then directing the generation of a weighted adjacency matrix using the calculated representation of edges, and finally updating the node representation on a plurality of densely connected graph convolution sublayers;

Because an edge may be associated with multiple context sentences, in order to compute the representation of an edge between node u and node v, a word-level attention mechanism is first utilized to obtain a representation of a single sentence, and then a gating mechanism is utilized to fuse the information of multiple sentences to obtain an entity-aware edge representation;

wherein c ε { u, v } represents either of two entities, Is the relative distance vector of the current word and entity c,/>Is the attention weight of the jth word in the ith sentence perceived by entity c, m is the number of words in the ith sentence, W ₁、W₂, z and b ₁ are trainable parameters;

Word level attention calculation is carried out on the ith sentence by utilizing the entities u and v respectively to obtain two sentence representations And/>The two sentence representations are spliced and input into a fully connected layer to obtain sentence representations that simultaneously sense entities u and v, as shown in the formula:

In order to make the model consider the information of all sentences on the edge uv, a physical perception gate mechanism is adopted; for entity c e { u, v }, its initial representation is employed Calculating the weight of each sentence, and taking the weighted sum of all sentences as the representation of the edge; the calculation process is as follows:

Wherein σ (·) is a sigmoid or ReLU activation function, W ₃、W₄、W₅ and b ₂ are trainable parameters, S represents the total number of sentences;

representing edges perceived by the calculated entities u and v respectively And/>Spliced and input into a fully connected layer to obtain edge representations of the entities u and v sensed simultaneously, as shown in the formula:

Wherein "; "means a splice operation, W _sg and b _sg are trainable parameters;

Through the calculation process, CAGGC initial edge representation matrix of the network is obtained

A calculation method of a weighted adjacent matrix comprehensively considering node information and side information; the weight between nodes u and v is expressed asIt is calculated from the formula:

Wherein W, W _u、W_v and W _e are trainable parameters, exp represents an exponential function based on e;

The GCGCN model also blends the edge representation into the graph convolution operation, updating the node representation with rich context information; the two hierarchical graph convolution inference modules GCGCN each comprise K densely connected sublayers, and the calculation result of the node v through the kth sublayer is as follows:

Wherein, And b ^k is a trainable parameter for the kth sublayer;

(3) Multi-head attention-directed graph convolution module

Collecting interactions between all nodes, in particular nodes connected by multi-hop paths, by using multi-head attention;

As a result of the introduction of the multi-headed attention mechanism MAGGC expands the partially connected graph used in the previous module into a weighted fully connected graph; first computing the edge representation, MAGGC module replaces P ⁽⁰⁾ in CAGGC module with P ⁽¹⁾, and calculates the entity-aware edge representation matrix in the same manner Edges if entities u and v do not appear in any of the sentencesIs zero vector;

MAGGC calculate the adjacency matrix directly using the self-attention mechanism as shown in the formula:

where W _Q and W _K are trainable parameters, d represents the vector dimension;

Since the multi-head attention includes multiple self-attention, t different adjacency matrices are calculated using the above formula Representing the calculated t outputs/>Firstly reducing the dimension and then splicing to obtain an output P ⁽²⁾ of the MAGGC module;

(4) Relationship classification layer

P＝tanh(W_p[P⁽⁰⁾;P⁽¹⁾;P⁽²⁾]+b_p)

Wherein, P ⁽⁰⁾ is the initial node representation, P ⁽¹⁾ and P ⁽²⁾ are the node representations output by the CAGGC and MAGGC modules, respectively, and W _p and b _p are trainable parameters;

P_u′＝[P_u;E(d_u,v)]

P_v′＝[P_v;E(d_v,u)]

P(r|u,v)＝sigmoid(P_u′^TW_rP_v′+W_t[P_u′;P_v′]+b_r)

Wherein "; "represents a concatenation operation, d _u,v and d _v,u are the first mentioned relative distances of two entities, E is the mapping matrix of the relative distance vectors;

wherein S represents the entire training set, II (-) is the indicator function, and R is a set of predefined relationship types;

Wherein D (P _i,e_j) is the distance of the textual entity representation P _i from the translation model-based entity representation e _i; the matrix M is used to map the entity representation P _i of the text to the space of the entity representation e _i of the translation model:

D(P_i,e_i)＝||MP_i-e_i||