CN114610897A

CN114610897A - Medical knowledge map relation prediction method based on graph attention machine mechanism

Info

Publication number: CN114610897A
Application number: CN202210181938.2A
Authority: CN
Inventors: 何坚; 苗宁; 张仰; 陈建辉
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2022-06-10

Abstract

A medical knowledge map relation prediction method based on a graph attention machine mechanism belongs to the field of electronic information. The invention relates to the following points 3: (1) different weights (attention) are assigned to nearby nodes and attention is propagated through iterative and hierarchical computations. (2) And an auxiliary edge is introduced between multi-hop neighbors, so that the effective propagation of knowledge flow between entities is realized, and an embedded model based on graph attention is constructed. (3) The application of ConvKB as a decoder effectively captures the associations existing between entities and their neighbors. Aiming at the relation prediction task in the medical knowledge graph, the invention constructs an embedded model based on graph attention by expanding a graph attention mechanism, captures the entity and relation characteristics among multi-hop neighborhoods of a given entity and further perfects the association relation among the entities in the medical knowledge graph.

Description

Medical knowledge map relation prediction method based on graph attention machine mechanism

Technical Field

The invention belongs to the field of electronic information, and relates to a technology which is based on a graph neural network and can be applied to medical knowledge map relation prediction.

Background

The knowledge graph is a structured representation of real world information, and even the most advanced knowledge graph has the problems of incompleteness and continuous perfection. The relation prediction is a technology for predicting missing facts according to existing entities in a knowledge graph, and can complement and enhance the knowledge graph. Recent studies have shown that Convolutional Neural Network (CNN) based models can generate rich and more expressive feature embeddings and thus perform well in relation prediction. However, such knowledge graph models independently process triples and fail to capture the complex and hidden information inherent in the local neighborhood around the triples.

Aiming at the problems of the convolutional neural network model in knowledge graph relation prediction, a feature embedding method based on a graph attention machine mechanism is provided, and association relations between entities and fields of the entities are captured.

Disclosure of Invention

Aiming at the problems that a model based on translation distance and a Convolutional Neural Network (CNN) can only independently process a single triple and is difficult to capture the relationship between adjacent domains of a given entity, and the like, the invention provides a relationship prediction graph embedding method based on attention, and realizes a more expressive knowledge graph relationship prediction technology.

The invention relates to the following 3 points:

(1) different weights (attention) are assigned to nearby nodes and attention is propagated through iterative and hierarchical computations.

(2) And an auxiliary edge is introduced between multi-hop neighbors, so that the effective propagation of knowledge flow between entities is realized, and an embedded model based on graph attention is constructed.

(3) The application of ConvKB as a decoder efficiently captures the associations existing between an entity and its neighbors.

The core algorithm of the invention is as follows:

(1) graph attention-based graph embedding method for relation prediction

Entities in the knowledge Graph play different roles under different relationships, and Graph Attention networks (GATs) ignore the role of the relationships in the knowledge Graph, and a novel Graph Attention-based Graph embedding method is provided by taking the characteristics of the relationships and adjacent nodes into an Attention mechanism.

Unlike GATs, the inputs to each layer of the model contain entity embedding matrices and relationship embedding matrices.

Wherein, the entity embedding matrix H is as the formula (1):

wherein,

feature matrix representing entities, N_eIs the total number of entities and T is the embedded feature dimension of each entity.

The relationship embedding matrix G is expressed by the following formula (2):

wherein,

feature matrix representing relations, N_rIs the number of relationships, and P is the characteristic dimension of the relationship matrix embedding.

Referring to the network architecture of fig. 1, the updated embedded matrices H 'and G' are calculated from the input matrices H and G as follows:

the method comprises the following steps: defining a set of entities E ═ { E ] in a medical knowledge graph₁,…,e_i,…,e_n}，e_iIs the embedding of the ith entity. First, study with e_iRepresentation of each triplet associated to obtain entity e_i

New embedding of (2). By targeting specific triplets

The entity and relationship feature vectors of (3) perform a linear transformation to learn these embeddings as in equation (3).

Wherein,

is a triplet

Is represented by a vector of (a).

Are respectively entity e_i、e_jAnd relation r_kEmbedded representation of W₁Is a linear transformation matrix.

Step two: the importance degree of each triplet, namely the attention coefficient b, is obtained by adopting the same idea as the GATs_ijk. As shown in formula (4), a linear transformation is first performed, and then a non-linear activation function is applied to obtain b_ijkIn the formula (4), W₂Is a linear transformation matrix.

b_ijk＝LeakyReLU(W₂c_ijk) (4)

Furthermore, normalization of the attention coefficient was performed using equation (5) to obtain a relative attention value α_ijk。

Wherein N is_iDenotes all of_iSet of adjacent entities, R_inRepresenting a connecting entity e_i、e_nSet of relationships, b_inrIs represented by_iAttention coefficients of neighboring entities.

After the normalized attention coefficient is obtained, the updated embedding vector is calculated according to formula (8). The model adopts a multi-head attention mechanism, so that the model learns related information in different expression subspaces, and the stability of the learning process is ensured. In addition, in order to reduce the output dimension, a method of calculating an average value is adopted to obtain a final embedded vector.

Where M is the number of attention heads and σ represents a non-linear function. j represents and entity e_iAdjacent entities, k representing entity e_iWith entity e_jThe relationship between them.

Step three: as shown in equation (9), a weight matrix W is used^RAnd carrying out linear transformation on the relation matrix G to obtain a new embedded matrix.

G′＝GW^R (7)

Wherein, W^R∈R^T×T′And T' is the dimension of the relational embedding vector of the layer output.

Step four: entity weight matrix W^E(

Feature matrix, T, representing a relationship_i、T_jDimension representing initial and final entity embedding vectors, respectively), HⁱFor entity embedding vectors of the input model, the initial entity embedding vector H is given according to equation (8)ⁱLinear transformation is carried out to obtain a transformed entity embedding vector H^t。

H^t＝W^EHⁱ (8)

Step five: embedding the initial entity into vector H according to equation (9)^tEntity embedding vector H obtained from last attention layer^fThe addition yields the update entity embedding vector H ".

H”＝H^t+H^f (9)

Wherein,

(

feature matrix being entity) embedding matrix for entity output of last attention layer, N_eIs the total number of entities, T_fIs the characteristic dimension of the final entity embedding vector.

In addition, k (k >1) hop adjacency (dotted directional line segment in fig. 2) is defined as an auxiliary relationship in the knowledge graph, and embedding of the auxiliary relationship is the sum of embedding of all the relationships in the directional path. Thus, for a multi-layer model, at the s-th layer, updated embedding vectors can be calculated by aggregating neighbors of adjacent s-hops.

The graph attention network will take triples present in the knowledge graph as valid triples

As a positive example of training. And randomly replacing a head entity or a tail entity in the triple by the entity to form an invalid triple t'_ijAs a negative example of training.

At the same time, the invention learns the embedding by using the idea of a translation scoring function, namely for a given effective triple

Is provided with

Wherein e_iIs an entity, e_jIs e_iNearest neighbor entity, r_kIs e_iAnd e_jThe relationship between them.

Is entity e_iThe embedded vector of (a) is embedded,

is a relation r_kThe embedded vector of (a) is embedded,

is entity e_jThe embedded vector of (2).

In model training, entity is closedThe minimum L1 non-similarity norm is adopted for the system embedding learning

And uses the change loss function shown in equation (10).

Wherein, γ>0, is a margin over-parameter; s is the correct triplet set and S' is the invalid triplet set. S' is calculated according to formula (11) and includes a triplet obtained by substituting the head entity

And triplets derived in place of tail entities

(2) ConvKB decoder based on convolutional neural network

The model uses ConvKB as decoder, analyzes triplets by convolutional layer

And global embedding characteristics of different dimensions are adopted, so that the conversion characteristics of the model are summarized. The model calculates a plurality of feature mapping scores during decoding according to equation (12).

Wherein, ω is^qDenotes the q-th filter, Ω is the number of filters, is the convolution operator，W∈R^Ωk×1Is a linear transformation matrix used to calculate the final scores of the triples.

In order to improve the generalization capability of the model, the soft boundary loss function shown in formula (13) is adopted in the model training to calculate the loss:

wherein,

coefficient representing positive and negative cases when

When the temperature of the water is higher than the set temperature,

when in use

When the utility model is used, the water is discharged,

λ is the hyperparameter of the L2 norm.

Effects of the invention

Aiming at the relation prediction task in the medical knowledge graph, an embedded model based on graph attention is constructed by expanding a graph attention mechanism, and the entity and relation characteristics among multi-hop neighborhoods of a given entity are captured, so that the incidence relation among the entities in the medical knowledge graph is perfected.

Drawings

FIG. 1 is a diagram of a network architecture;

FIG. 2 is a schematic view of an auxiliary relational edge;

FIG. 3 is a schematic illustration of an attention mechanism;

FIG. 4 shows the structure of ConvKB.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings.

In view of the fact that knowledge maps are constructed in the early stage, because knowledge sources are limited, the knowledge coverage is not enough, and a large amount of related knowledge is lost. Therefore, in the medical field, there is a need for relationship prediction to complement knowledge of medical knowledge maps.

The embodiment provides a knowledge graph relation completion method flow, which is implemented by the following steps:

1) GATs are trained to encode entity and relationship information.

2) ConvKB was trained as a decoder to perform the relational prediction task.

The following gives the relevant definitions of the invention:

definition 1 (the knowledge-graph,

)

representing all entities and relationships contained in the knowledge-graph.

Definition 2 (entity set, E) E ═ E₁,…,e_i,…,e_nAnd representing all entity nodes in the knowledge graph, which correspond to the entity set in the knowledge base.

Definition 3 (set of relationships, R) R ═ { R₁,…,r_i,…,r_nAnd expressing all the relation edges in the knowledge graph, which correspond to the relation sets in the knowledge base.

The definition 4 (the triple,

)

e_irepresents a head entity, r_kRepresents a relationship, e_jRepresenting the tail entity. Wherein e is_i,e_j∈E，r_kE.r, a triplet is also called a knowledge.

1) First stage

The GAT is trained to encode information about entities and relationships in the graph for better, more expressive embedding.

S1: and acquiring a medical knowledge map to be processed, and converting all knowledge in the knowledge map into a knowledge base file in a triple storage form. The Neo4j graph database is used to convert the knowledge graph into an RDF repository file in triple storage.

S2: because the computer cannot recognize the triples in text form, it needs to be converted into a word vector space representation before being input to the neural network for relationship determination.

The invention learns initial embedding by using TransE by referencing the idea of a translation scoring function, namely for a given effective triple

Is provided with

Is entity e_iThe embedded vector of (a) is embedded,

is a relation r_kThe embedded vector of (a) is embedded,

is entity e_jThe embedded vector of (2).

Wherein, the entity embedding matrix H is as the formula (1):

wherein R represents the feature matrix of the entity, N_eIs the total number of entities, T is the feature dimension embedded by each entityAnd (4) degree.

The relationship embedding matrix G is expressed by the following formula (2):

wherein,

S3: for each entity, it is necessary to learn the vector representation of all triples associated with it, to obtain a new embedding of the entity, by means of the specific triples

Wherein,

is a triplet

Is represented by a vector.

S4: in order to map the input features to the output feature space with higher dimension, a linear transformation is first performed, and then a nonlinear activation function is applied to obtain the attention coefficient b_ijkRepresenting the importance of each triplet.

W is shown in the formula (4)₂Is a linear transformation matrix. In order to avoid the problems of gradient disappearance and the like, LeakyReLU is used as an activation function, and the value of the hyper-parameter of the LeakyReLU is 0.2 through multiple comparison experiments and references.

b_ijk＝LeakyReLU(W₂c_ijk) (4)

S5: applying softmax operation to the original attention scores obtained by all incoming edges of a node, carrying out normalization on the attention coefficient according to formula (5) to obtain a relative attention value alpha_ijk。

S6: after the normalized attention coefficient is obtained, the updated embedding vector is calculated according to formula (8).

The model adopts a multi-head attention mechanism, so that the model learns related information in different expression subspaces, and the stability of the learning process is ensured. In addition, in order to reduce the output dimension, a method of calculating an average value is adopted to obtain a final embedded vector.

In addition, the method adopts dropout operation, abandons the activation of some neurons with the probability of 0.3, and avoids the problems of overfitting and the like.

S7: as shown in equation (9), a weight matrix W is used^RLinear transformation of the relation matrix GA new embedded matrix is obtained.

G′＝GW^R (7)

S8: embedding the vector H into the initial entity according to equation (8)ⁱLinear transformation is carried out to obtain a transformed entity embedding vector H^t。

H^t＝W^EHⁱ (8)

Wherein the entity weight matrix W^E(

Feature matrix, T, representing a relationship_i、T_jDimension representing initial and final entity embedding vectors, respectively), HⁱVectors are embedded for the entities of the input model.

S9: when learning new embedded vectors, their original embedded information may be lost, so the original entity embedded vector H is based on equation (9)^tEntity embedding vector H obtained from last attention layer^fThe addition yields the update entity embedding vector H ".

H”＝H^t+H^f (9)

Wherein,

(

S10: the graph attention network will take triples present in the knowledge graph as valid triples

As a positive example of training. And randomly replacing a head entity or a tail entity in the triple by the entity to form an invalid triple t'_ijAs a negative example of training. In this training phase, the ratio of the number of valid triples to invalid triples is 2: 1.

In model training, entity relation embedding learning adopts minimum L1 non-similarity norm

And uses the change loss function shown in equation (10).

Wherein, γ>0 is a margin hyper-parameter with a value of 5; s is the correct triplet set and S' is the invalid triplet set. S' is calculated according to formula (11) and includes a triplet obtained by substituting the head entity

And triplets derived in place of tail entities

Representing the removal of E from the entity set E_iThe entity of (1).

In addition, the model was continuously optimized using an Adam optimizer with a learning rate of 0.001.

2) Second stage

The model uses ConvKB as decoder, analyzes triplets by convolutional layer

And (4) global embedding characteristics of different dimensions, thereby summarizing the conversion characteristics of the model.

S11: the model calculates a plurality of feature mapping scores during decoding according to equation (12).

Wherein, ω is^qDenotes the q-th filter, Ω is the number of filters, is the convolution operator, W ∈ R^Ωk×1Is a linear transformation matrix used to calculate the final scores of the triples.

As shown in formula (3), is a triplet

Is represented by a vector.

Through multiple comparison experiments, in order to avoid the problems of gradient disappearance and the like, LeakyReLU with a hyper-parameter value of 0.2 is adopted as an activation function.

S12: in order to improve the generalization capability of the model, the soft boundary loss function shown in formula (13) is adopted in the model training to calculate the loss:

wherein,

coefficient representing positive and negative cases when

When the temperature of the water is higher than the set temperature,

when in use

When the temperature of the water is higher than the set temperature,

is a hyperparameter of L2 norm, and takes the value of 0.00001.

In this training phase, the ratio of the number of valid triples to invalid triples is 4: 1.

Claims

1. The medical knowledge map relation prediction method based on the graph attention machine system is characterized by comprising the following steps of:

(1) graph attention-based graph embedding method for relation prediction

The input of each layer of the model comprises an entity embedding matrix and a relation embedding matrix; wherein, the entity embedding matrix H is as the formula (1):

wherein R represents the feature matrix of the entity, N_eIs the total number of entities, T is the characteristic dimension embedded by each entity;

the relationship embedding matrix G is expressed by the following formula (2):

wherein,

feature matrix representing relations, N_rIs the relation quantity, P is the characteristic dimension of relation matrix embedding;

according to the input matrixes H and G, calculating the updated embedded matrixes H 'and G' according to the following steps:

the method comprises the following steps: defining a set of entities E ═ { E ] in a medical knowledge graph₁,…,e_i,…,e_n}，e_iIs the embedding of the ith entity; first, study with e_iRepresentation of each triplet associated to obtain entity e_iNew embedding of (2); by targeting specific triplets

The entity and relationship feature vectors of (3) perform a linear transformation as in equation (3) to learn these embeddings;

wherein,

is a triplet

A vector representation of (a);

are respectively entity e_i、e_jAnd relation r_kEmbedded representation of W₁Is a linear transformation matrix;

step two: obtaining the importance degree of each three group, namely the attention coefficient b_ijk(ii) a As shown in formula (4), a linear transformation is first performed, and then a non-linear activation function is applied to obtain b_ijkIn the formula (4), W₂Is a linear transformation matrix;

b_ijk＝LeakyReLU(W₂c_ijk) (4)

furthermore, normalization of the attention coefficient was performed using equation (5) to obtain a relative attention value α_ijk；

Wherein N is_iDenotes all of_iSet of adjacent entities, R_inRepresenting a connection entity e_i、e_nSet of relationships, b_inrIs represented by_iAttention coefficients of neighboring entities;

after obtaining the normalized attention coefficient, calculating an updated embedding vector according to a formula (8); the model adopts a multi-head attention mechanism, so that the model learns related information in different expression subspaces, and the stability of the learning process is ensured; in addition, in order to reduce output dimensionality, a method of calculating an average value is adopted to obtain a final embedded vector;

wherein M is the number of the attention heads, and sigma represents a nonlinear function; j represents and entity e_iAdjacent entities, k represents entity e_iWith entity e_jThe relationship between;

step three: as shown in equation (9), a weight matrix W is used^RCarrying out linear transformation on the relation matrix G to obtain a new embedded matrix;

G′＝GW^R (7)

wherein, W^R∈R^T×T′T' is the dimension of the relation embedding vector of the layer output;

step four: entity weight matrix W^E，

Feature matrix, T, representing a relationship_i、T_jRepresenting the dimensions of the initial and final entity embedding vectors, respectively, for the entity embedding vector of the input model, the initial entity embedding vector H is given according to equation (8)ⁱLinear transformation is carried out to obtain a transformed entity embedding vector H^t；

H^t＝W^EHⁱ (8)

Step five: embedding the initial entity into vector H according to equation (9)^tEntity embedding vector H obtained from last attention layer^fAdding to obtain an updated entity embedding vector H';

H”＝H^t+H^f (9)

wherein,

is a feature matrix of the entity, an entity embedding matrix for the last attention layer output, N_eIs the total number of entities, T_fIs the characteristic dimension of the final entity embedding vector;

in addition, k hop adjacency is defined as an auxiliary relation in the knowledge graph, the embedding of the auxiliary relation is the sum of the embedding of all the relations in the directed path, and k is greater than 1; thus, for a multilayer model, an updated embedding vector can be calculated by aggregating neighbors of adjacent s hops at the s-th layer;

As a positive example of training; and randomly replacing a head entity or a tail entity in the triple by the entity to form an invalid triple t'_ijAs a negative example of training;

learning embeddings by using the concept of a translation scoring function, i.e. for a given valid triplet

Is provided with

Whereine_iIs an entity, e_jIs e_iNearest neighbor entity, r_kIs e_iAnd e_jThe relationship between;

is entity e_iThe embedded vector of (a) is embedded,

is a relation r_kThe embedded vector of (a) is embedded,

is entity e_jThe embedded vector of (2);

And using a change loss function shown in formula (10);

wherein, γ>0, is a margin over-parameter; s is a correct triple set, and S' is an invalid triple set; s' is calculated according to formula (11) and includes a triplet obtained by substituting the head entity

And triplets derived in place of tail entities

(2) ConvKB decoder based on convolutional neural network

The model uses ConvKB as decoder, analyzes triplets by convolutional layer

Global embedding characteristics of different dimensions are adopted, and then the conversion characteristics of the model are summarized; the model calculates a plurality of feature mapping scores according to formula (12) during decoding;

wherein, ω is^qDenotes the q-th filter, Ω is the number of filters, is the convolution operator, W ∈ R^Ωk×1Is a linear transformation matrix for calculating the final scores of the triples;

the loss is calculated using the soft boundary loss function shown in equation (13):

wherein,

coefficient representing positive and negative cases when

When the temperature of the water is higher than the set temperature,

when in use

When the temperature of the water is higher than the set temperature,

λ is the hyperparameter of the L2 norm.

2. The method of claim 1, wherein:

the following relevant definitions:

definition 1 (the knowledge-graph,

)

representing all entities and relationships contained in the knowledge-graph;

definition 2 (entity set, E) E ═ E₁,…,e_i,…,e_nRepresenting all entity nodes in the knowledge graph, corresponding to the entity set in the knowledge base;

definition 3 (set of relationships, R) R ═ { R₁,…,r_i,…,r_nRepresenting all relation edges in the knowledge graph, corresponding to the relation sets in the knowledge base;

the definition 4 (the triple,

)

e_irepresents a head entity, r_kRepresents a relationship, e_jRepresenting a tail entity; wherein e is_i,e_j∈E，r_kE.g. R, a triple is also called a knowledge;

first stage

Training the GAT to encode information of entities and relationships in the graph;

s1: acquiring a medical knowledge graph to be processed, and converting all knowledge in the knowledge graph into a knowledge base file in a triple storage form; converting the knowledge graph into an RDF knowledge base file in a triple storage form by using a Neo4j graph database;

s2: because the computer cannot identify the triples in the text form, before the triples are input into the neural network for relation judgment, the triples need to be converted into a form expressed by a word vector space;

using the concept of a translation scoring function, TransE is used to learn the initial embedding, i.e., for a given valid triplet

Is provided with

Wherein e_iIs an entity, e_jIs e_iNearest neighbor entity, r_kIs e_iAnd e_jThe relationship between;

is entity e_iThe embedded vector of (a) is embedded,

is a relation r_jThe embedded vector of (a) is embedded,

is entity e_jThe embedded vector of (2);

wherein, the entity embedding matrix H is as the formula (1):

wherein R represents the feature matrix of the entity, N_eIs the total number of entities, T is the characteristic dimension each entity embeds;

the relationship embedding matrix G is expressed by the following formula (2):

wherein,

wherein,

is a triplet

A vector representation of (a);

s4: in order to map the input features to the higher-dimensional output feature space, a linear transformation is required, and then a nonlinear activation function is applied to obtain the attention coefficient b_ijkRepresenting the importance of each triplet;

w is shown in the formula (4)₂Is a linear transformation matrix; in order to avoid the problems of gradient disappearance and the like, LeakyReLU is used as an activation function, and the hyper-parameter value of the LeakyReLU is 0.2 through multiple comparison experiments and references;

b_ijk＝LeakyReLU(W₂c_ijk) (4)

s5: applying softmax operation to the original attention scores obtained by all incoming edges of a node, carrying out normalization on the attention coefficient according to formula (5) to obtain a relative attention value alpha_ijk；

Wherein N is_iDenotes all of_iSet of adjacent entities, R_inRepresenting a connecting entity e_i、e_nSet of relationships, b_inrIs represented by_iAttention coefficients of neighboring entities;

s6: after obtaining the normalized attention coefficient, calculating an updated embedding vector according to a formula (8);

the model adopts a multi-head attention mechanism, so that the model learns related information in different expression subspaces, and the stability of the learning process is ensured; in addition, in order to reduce output dimensionality, a method of calculating an average value is adopted to obtain a final embedded vector;

wherein M is the number of the attention heads, and sigma represents a nonlinear function; j represents and entity e_iAdjacent entities, k representing entity e_iWith entity e_jThe relationship between;

furthermore, with the dropout operation, the activation of some neurons is discarded with a probability of 0.3;

s7: as shown in equation (9), a weight matrix W is used^RCarrying out linear transformation on the relation matrix G to obtain a new embedded matrix;

G′＝GW^R (7)

s8: embedding vectors into the initial entity according to equation (8)HⁱLinear transformation is carried out to obtain a transformed entity embedding vector H^t；

H^t＝W^EHⁱ (8)

Wherein the entity weight matrix W^E，

Feature matrix, T, representing a relationship_i、T_jDimension, H, representing the initial and final entity embedding vectors, respectivelyⁱEmbedding vectors for entities of the input model;

s9: when learning new embedded vectors, their original embedded information may be lost, so the original entity embedded vector H is based on equation (9)^tEntity embedding vector H obtained from last attention layer^fAdding to obtain an updated entity embedding vector H';

H”＝H^t+H^f (9)

wherein,

for entities the feature matrix is the entity-embedded matrix of the last attention layer output, N_eIs the total number of entities, T_fIs the characteristic dimension of the final entity embedding vector;

As a positive example of training; and randomly replacing a head entity or a tail entity in the triple by the entity to form an invalid triple t'_ijAs a negative example of training; during this training phase, valid tripletsAnd the number ratio of invalid triples is 2: 1;

And using a change loss function shown in formula (10);

wherein, γ>0 is a margin hyper-parameter with a value of 5; s is a correct triple set, and S' is an invalid triple set; s' is calculated according to formula (11) and includes a triplet obtained by substituting the head entity