CN110334219A

CN110334219A - The knowledge mapping for incorporating text semantic feature based on attention mechanism indicates learning method

Info

Publication number: CN110334219A
Application number: CN201910629813.XA
Authority: CN
Inventors: 惠孛; 罗光春; 张栗粽; 卢国明; 李攀成
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-07-12
Filing date: 2019-07-12
Publication date: 2019-10-15
Anticipated expiration: 2039-07-12
Also published as: CN110334219B

Abstract

The present invention relates to knowledge mappings, it discloses a kind of knowledge mappings for incorporating text semantic feature based on attention mechanism to indicate learning method, translation model is solved to fail to fail to incorporate semantic feature, and the problem of Text Feature Extraction effect difference simultaneously for entity and relationship using semantic feature deficiency and multi-source information embedding grammar caused by entity and the description text of relationship.This method may be summarized to be: obtain the description text of simultaneously processing entities and relationship first, obtain its text semantic feature, then the projection matrix of the semantic feature of entity and relationship building entity is utilized, by entity vector projection into relation space, study is modeled and be indicated in relation space followed by the thought of translation, the complex relationship of multi-to-multi is modeled with this.The expression that the present invention is suitable for knowledge mapping learns.

Description

Knowledge graph representation learning method based on attention mechanism and integrated into text semantic features

Technical Field

The invention relates to a knowledge graph, in particular to a knowledge graph representation learning method based on an attention mechanism and integrated with text semantic features.

Background

With the development of internet technology, data is growing explosively. However, because the content on the internet is heterogeneous in multiple sources and loose in organization structure, and the information in the content is difficult to be utilized efficiently, Google proposes a concept of a Knowledge Graph (Knowledge Graph) in 5 months of 2012, and aims to convert massive unstructured or semi-structured data into normative, uniform, reliable and effective structured Knowledge, so that a highly interconnected semantic network is formed and support is provided for data mining and intelligent services.

A knowledge graph can be viewed as a network of directed graph structures, where graph nodes represent entities or concepts and edges in the graph represent relationships between entities and entities or between entities and concepts. Knowledge is generally described in the form of triples, i.e., (subject, predicate, object) or (entity, relationship, entity). Knowledge Graph representation learning (Knowledge Graph representation learning) aims at learning vectorized representations of entity relationships, converting Knowledge in symbolic form into calculable real-valued vectors.

In the traditional technology, there are many schemes for learning knowledge graph representation based on a translation model:

mikolov et al, using word embedding tool word2vec, find that there is a translation invariant phenomenon in the word vector space, such as v (king) -v (queen) -v (man) -v (wman), where v (king) represents the vector of word king obtained using word2 vec. Inspired by this phenomenon, Bordes et al proposed that the TransE model treats relationships in a knowledge graph as a translation (pan) operation of head to tail entities in an embedding space: if a triplet (h, r, t) exists or holds, then in the embedding space the head entity vector plus the relationship vector should be as close as possible to the tail entity vector, i.e., h + r ≈ t. Which defines a scoring function of

The TransE model is simple and effective, has expansibility on a large-scale knowledge graph, and has serious defects. The relation in the knowledge graph can be divided into 1-1, 1-N, N-1 and N-N according to the number of entities connected at two ends of the relation, and the model of TransE determines that the relation is only effective for the relation of 1-1, and the relation is effective for the relationThe type of the relationship is very problematic, such as in the N-1 relationship, (h_ir, T ∈ T, meaning h₀＝h₁＝…＝h_mThis is clearly not reasonable.

Aiming at the defects of the TransE in complex relations, the head and tail entity vectors are projected to a relation plane respectively by the TransH model and then are subjected to translation operation, so that the entities can have different representations under different relations. W for TransH_r、d_rTwo vectors represent the relationship r, where w_rIs a normal vector of the relation hyperplane, d_rFor the translation operation corresponding to the relationship, firstly, respectively projecting the head and tail entity vectors to a relationship plane, and then, performing the translation operation, wherein the corresponding scoring function is as follows:

both TransE and TransH assume that entities and relationships are in the same semantic space, whereas relationships and entities are different objects, and TransR models entities and relationships in different spaces. For a triplet (h, r, t), the entity is embeddedRelationship embeddingFor each relation r, a projection matrix is setFor projecting entities from an entity space to a relationship space. Similarly, its scoring function becomes:

TransD proposes a method of dynamically changing matrices to address multiple semantic representations of relationships. It defines two representations for each entity or relationship, one representing its own semantics (h, r, t) and the other (h)_p,r_p,t_p) Represents the way from entity vectors to the relational vector space, and the second representation will be used to construct the mapping matrix:

after the mapping matrix exists, the entity relation vector and the scoring function after the mapping can be obtained:

h_⊥＝M_rhh,t_⊥＝M_rtt (6)

it can be seen that models such as TransD and TransE, TransH, etc. essentially model only structural features inside a triple such as "translation", and ignore other semantic features of entities and relationships.

Some multi-source information embedding methods in the traditional technology introduce more semantic features for entity and relationship representation by embedding text corpora:

DKRL takes the text of an entity as an entity description, and proposes a knowledge representation learning method integrated with entity description information. Each entity is represented by two types: representation e based on structure_sAnd representation e based on the description_dThe triple score consists of two parts: e ═ E_S+E_D. The structure was represented using the TransE model: e_S＝‖h_s+r-t_sIito match the learning process based on the representation of the description with E_SAdaptation, E_DIs further divided into_DD＝‖h_d+r-t_d‖、E_DS＝‖h_d+r-t_s‖、E_SD＝‖h_s+r-t_dAnd II, three parts. Based on the representation of the description obtained by processing the text of the entity description, the authors have designed two ways, CBOW (Continuous Bag-of-Words) encoder and convolutional neural network encoder, to extract the semantic features of the entity description. It can be seen that the DKRL uses the TransE model when it is combined with entity description information, however TransE cannot model many-to-many relationships. In addition, the DKRL method only introduces description information for the entity, and does not consider the semantic characteristics of the relationship.

TEKE is also a representation learning method that uses text to enhance entity relationship semantics: given a knowledge graph KG and text corpora represented as word sequences, TEKE firstly uses an entity linking tool to label words in a corpus to obtain a labeling sequence D (x) corresponding to an entity in the knowledge graph₁,x₂,…x_n). To join knowledge-graph KG with textual information D, the author constructs a co-occurrence network G ═ X, Y consisting of entities and words, where X is_iNodes representing a network, corresponding to a word or an entity, y_ijDenotes x_iAnd x_jInter-occurrence frequency. Based on the co-occurrence network, selecting a set of tagged terms with co-occurrence frequency exceeding a given threshold as semantic context of the corresponding entity, and constructing vector representation thereof. The text processing mode of the TEKE constructed co-occurrence network is more traditional, the operation is complex, and the semantic information among the words in the sequence is not fully utilized.

In summary, the translation-based model essentially models only the structural features inside the triples, and does not utilize the description text of the entities and relationships, thus omitting other semantic features of the entity relationships in the knowledge graph. Under the condition, the entity relationship vectors are not fully learned due to the sparsity of the knowledge graph, translation characteristics are often only roughly met, the quality is not high, entities with the same relationship but different meanings are difficult to distinguish, and negative influences are brought to the accuracy of tasks such as subsequent knowledge fusion and knowledge graph complementation.

The multi-source information embedding methods such as DKRL, TEKE and the like expand the semantics of the entity by embedding entity description text corpora, but the methods have the following defects: firstly, DKRL adopts a TransE method to embed structures, and cannot meet many-to-many complex relationships in a knowledge graph; secondly, DKRL only embeds the entity text description, and integrates semantic features into the entity, but does not consider the semantic features of the relationship; and thirdly, when the DKRL and the TEKE process entity text description, a convolutional neural network and a co-occurrence network of words are respectively used, and the mutual influence among the words in a sequence is not considered.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the method is used for solving the problems that semantic features are insufficient due to the fact that a translation model cannot utilize description texts of entities and relations, a multi-source information embedding method cannot simultaneously integrate the semantic features into the entities and the relations, and the text extraction effect is poor.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a knowledge graph representation learning method based on attention mechanism and text semantic features includes the following steps:

step 1, defining two representations for each entity and relation in a knowledge graph, wherein the two representations comprise semantic feature vector representation of the entity and semantic feature vector representation of a text;

step 2, aiming at each entity in the knowledge graph, obtaining a sentence containing the entity from a corpus and preprocessing the sentence, and then extracting semantic features of the sentence by adopting a self-attention mechanism to obtain a text semantic feature vector of the entity;

step 3, segmenting the name description of each relation in the knowledge graph to obtain a label word set, extracting semantic features of the label word set by adopting a self-attention mechanism, and obtaining a text semantic feature vector of the relation;

step 4, constructing a mapping matrix based on the text semantic feature vectors of the entities and the relations, and constructing a triple scoring function based on the idea of a translation model;

and 5, constructing a loss function based on intervals according to the triple scoring function, training a model by using a gradient descent optimization algorithm by using the knowledge map triple as a training set, and finally obtaining vector representation of the entity and the relation.

As a further optimization, step 2 specifically includes:

step 2.1, obtaining an entity description text and preprocessing:

for each entity e in the knowledge graph, acquiring at least one sentence containing the entity from a corpus as a description text of the entity, segmenting each sentence by using a segmentation tool, and then removing stop words to obtain a preprocessed word sequence;

step 2.2, building a text feature extraction model:

building a network model consisting of a plurality of layers of self-attention modules with multiple units, wherein the model is formed by stacking 3 same layers, namely 3 same layers in the longitudinal direction, each layer transversely comprises RH self-attention units for processing input so as to learn the characteristics of word sequences from different aspects, and each self-attention unit has a different parameter matrix; RH can be set by user;

step 2.3, obtaining vector representation x of each word as input of the model:

the vector representation of each word is represented by its word vectorAnd a position vectorIs calculated by

x＝l_word+l_pos (8)

Initializing the word vector by using a word embedding tool word2 vec; each word is calculated with a position code determined by its position order pos in the entity description text sequence, and the value of the ith dimension of the position vector is calculated in such a way that

Step 2.4, calculating the influence degree of each word and all other words in the sequence by using a self-attention mechanism to obtain the attention distribution of the self to other words, namely a weight value:

the influence degree between words is calculated by adopting multiplicative attention, and then multiplied by the original word vector to obtain the vector after attentionIs calculated by the formula

Where n is the number of words in the sequence,is a matrix of vectors of all words in the sequence,is a parameter matrix, W₁The value of (a) is initialized by normal distribution when training starts;

step 2.5, after feature extraction of 3 attention layers, adding all output vectors of the sequence and taking the added output vectors as entity semantic features through a ReLU activation function, wherein the calculation formula is as follows:

step 2.6, processing the RH different attention units, mapping the RH different attention units into a final entity semantic feature vector, wherein the calculation formula is as follows:

e_p＝ReLU(W₂E+b) (13)

whereinIs a mapping matrix, E is a matrix composed of the outputs of RH different attention cells, b is a bias vector, W₂And the value of b is initialized with a normal distribution at the beginning of training.

As a further optimization, step 3 specifically includes:

step 3.1, preprocessing the relation name:

for each relation r in the knowledge graph, segmenting the name of the relation r by using a segmentation tool to obtain a label word sequence;

step 3.2, building a text feature extraction model:

building a network model consisting of a single-layer multi-unit self-attention module, wherein the model is longitudinally provided with 1 self-attention layer, the layer transversely comprises RH self-attention units for processing input so as to learn the characteristics of word sequences from different aspects, and each self-attention unit is provided with a different parameter matrix; RH can be set by user;

step 3.3, obtaining the vector representation of each label word in the label word sequence as the input of the model;

step 3.4, calculating the matching degree between the label words by using a self-attention mechanism, and multiplying the matching degree by the original word vector to obtain an attention vector;

step 3.5, after feature extraction of the attention layer, adding all output vectors of the sequence and taking the output vectors as entity semantic features through a ReLU activation function;

step 3.6, RH different attention units are processed and mapped into an entity semantic feature vector r_p。

As a further optimization, step 4 specifically includes:

step 4.1, for a triplet (h, r, t), setting a projection matrix M for the head entity and the tail entity respectively_rhAnd M_rtProjecting entities from an entity space to a relationship space; the projection matrix is formed by the entities and the relations obtained in the step 2 and the step 3 respectivelyIs constructed by the semantic feature vector of

Wherein B is^k×dIs a parameter matrix to be learned;

and 4.2, multiplying the head entity and the tail entity with the respective projection matrixes to calculate the projection of the entities in the relation space, namely: h is_⊥＝M_rhh、t_⊥＝M_rtt；

4.3, in the relation space, according to the thought of the translation model, regarding the relation as the translation operation from the head entity to the tail entity, and constructing a triple scoring function as follows:

as a further optimization, step 5 specifically includes:

step 5.1, defining an interval-based hinge loss function to train a model by taking all original triples T in the knowledge graph as a training set, wherein the goal is to enable the triple scoring function to obtain a lower score for positive-case triples and a higher score for negative-case triples, and the loss function is as follows:

wherein, T'_(h,r,t){ (h ', r, t) | h' E, h '≠ h }, { (h, r, t') | t 'E, t' ≠ t } is a negative example set constructed on the basis of the triplet (h, r, t), with an interval value γ>0 is a hyperparameter;

step 5.2, for any entity, force its vector's L2 norm to 1, i.e.Thereby regularizing the entity embedding vector into a unit sphere;

step 5.3, in the training process, the fact triples of the knowledge graph are traversed for multiple times randomly, when each fact triplet is visited, a negative example triplet is constructed, and the selection mode of the negative example entity is as follows: adopting a K-nearest neighbor method, firstly calculating the similarity of an entity to be replaced and other entities by using a cosine similarity algorithm, sequencing from high to low, and then taking top-K entities as a negative example candidate set of the entity to be replaced;

step 5.4, optimizing the objective function by using a small batch gradient descent algorithmThe gradient is then calculated and the model parameters are updated.

The invention has the beneficial effects that:

(1) the invention simultaneously integrates structural features and text semantic features for entities and relations:

the invention embeds the text corpora described by the entity and the relation respectively, uses the text corpora to construct the entity and construct the projection matrix to the relation space, and finally performs expression learning based on the translation thought in the relation space, thereby not only considering the semantic characteristics of the entity and the relation, but also skillfully combining the structure embedding and the text embedding.

(2) Compared with other multi-source information embedding methods, the method can extract richer semantic features:

the method has the advantages that the attention mechanism is superior to natural language processing, the entity description and the relation description are processed by the multi-layer self-attention method, and semantic features with higher quality can be extracted efficiently.

(3) The K-nearest neighbor negative sampling method can enable the model to show better distinguishing capability:

the K-nearest neighbor negative sampling method can improve the quality of negative example triples, so that the learning of a model is enhanced, and the final model can better distinguish correct triples from error triples.

Drawings

FIG. 1 is a schematic diagram of a learning method of a knowledge graph representation based on attention mechanism and text semantic features;

FIG. 2 is a flow chart of a knowledge graph representation learning method of the present invention based on attention mechanism incorporating text semantic features;

FIG. 3 is a schematic diagram of text feature extraction based on the attention mechanism.

Detailed Description

The invention aims to provide a knowledge graph representation learning method based on attention mechanism and integrated into text semantic features, and solves the problems that a translation model cannot utilize description texts of entities and relations, so that the semantic features are insufficient, a multi-source information embedding method cannot integrate the semantic features into the entities and the relations at the same time, and the text extraction effect is poor.

The knowledge graph representation learning method of the invention combines text embedding and translation ideas. Firstly, obtaining and processing description texts of entities and relations to obtain text semantic features of the description texts, then constructing a projection matrix of the entities by using the semantic features of the entities and the relations, projecting entity vectors into a relation space, modeling in the relation space by using a translation idea and performing representation learning, so as to model many-to-many complex relations. The implementation principle of the method is as shown in fig. 1, and in the relationship space, the relationship is regarded as the translation operation from the head entity to the tail entity according to the idea of the translation model.

The knowledge graph representation learning method of the invention is shown in fig. 2, and comprises the following implementation steps:

step 1, defining two representations for each entity e in the knowledge graph, wherein one representation is the semantic feature of the entity and is represented as e. The other is the text semantic feature of the entity, denoted as e_p. The two representations are also defined for each relationship r in the knowledge-graph.

Step 2, for each entity e in the knowledge graph, obtaining sentences containing the entity from the corpus and preprocessing the sentences, and then extracting semantic features of the sentences by adopting a self-attention mechanism to obtain text semantic feature vectors e of the entity_p。

Step (ii) of3. For each relation r in the knowledge graph, segmenting the name description of the relation r to obtain a label word set, extracting semantic features of the label word set by adopting a self-attention mechanism to obtain a semantic feature vector r of the relation_p。

And 4, constructing a mapping matrix by using the semantic feature vectors of the entities and the relations, and constructing a triple scoring function, namely an energy equation, based on the translation idea.

And 5, constructing a loss function based on intervals according to the triple scoring function, training a model by using a gradient descent optimization algorithm by using the knowledge map triples as a training set, and finally obtaining vector representation of the entity and the relation.

In specific implementation, the required original data are a triple set of the knowledge graph and a corpus text set of the same language as the knowledge graph. The specific implementation of each step is further described as follows:

in step 1, all entities and relations of the knowledge graph are obtained, tenserflow is used for initializing two vectors of the entities and the relations, the dimension values of the entities and the relation vectors are respectively hyperparameters d and k, and the hyperparameters can be selected from {50,70,80 and 100 }. Semantic feature vectors for entities and relationships themselves use boundaries asThe uniformity distribution of (d) is initialized. Textual semantic features e of entities and relationships_pAnd r_pIt is not initialized randomly but calculated by step 2 and step 3.

Step 2, using description text of the entity as input, then adopting a self-attention mechanism to extract semantic features of the sentence, and outputting a vector e_p. The method comprises the following specific steps:

step 2.1, preprocessing an entity description text:

for each entity e in the knowledge graph, at least one sentence containing the entity is obtained from the corpus to serve as a description text of the entity, a word segmentation tool is used for segmenting each sentence, and then stop words are removed to obtain a preprocessed word sequence.

Step 2.2, building a text feature extraction model:

the basic processing unit of feature extraction is to apply a self-attention mechanism to the sequence, the model is composed of multiple layers and multiple units of self-attention modules, each layer has RH self-attention units, and the input is processed to learn the features of the sequence from different aspects. The model consists of a total of 3 identical layers stacked with CH ═ 3, i.e. 3 identical layers in the longitudinal direction, each layer containing RH self-attentive units in the lateral direction, as shown in fig. 3. Where each self-attention cell has a different parameter matrix. In the network model building, RH can be set by self-defining and can be selected from {1,2,3,4 }.

Step 2.3, the input to the model is a vector representation x for each word. The vector representation of each word is represented by its word vector And a position vectorIs calculated by

x＝l_word+l_pos (8)

The word vector is initialized with the word embedding tool word2 vec. Each word is calculated with a position code determined by its position order pos in the entity description text sequence, and the value of the ith dimension of the position vector is calculated in such a way that

Step 2.4, calculating the influence degree of each word and all other words in the sequence by using a self-attention mechanism to obtain the attention distribution (namely weight value) of the word to other words, wherein the weight value determines each wordHow much a word is expressed at the position to which it belongs. The influence degree between the words is calculated by adopting multiplicative attention, and then multiplied by the original word vector to obtain the vector after attentionIs calculated by the formula

Where n is the number of words in the sequence.Is a matrix of vectors of all words in the sequence.Is a parameter matrix, W₁The values of (c) can be initialized with a normal distribution at the beginning of training. Is divided byThe purpose of (1) is to scale the weight values to prevent them from being too large.

Step 2.5, after the feature extraction of CH attention layers, adding all output vectors of the sequence and taking the sum as an entity semantic feature through a ReLU activation function, wherein the calculation formula is

And 2.6, in order to integrate the semantic features learned in different aspects, finally, processing the RH different attention units, and mapping the RH different attention units into a final entity semantic feature vector. The calculation method is

e_p＝ReLU(W₂E+b) (13)

WhereinIs a mapping matrix, E is RH different attentionThe output of the force cell constitutes a matrix, b is the offset vector. W₂And the values of b may be initialized with a normal distribution at the beginning of training.

Step 3, taking the name label words of the relation as input, adopting a self-attention mechanism to extract semantic features of the label word set, and outputting a vector r_p. The method specifically comprises the following steps:

step 3.1, preprocessing the relation name: and for each relation r in the knowledge graph, segmenting the name of the relation r by using a segmentation tool to obtain a label word sequence. For example, the relationship name "/actent/traffic _ actent/res-reactive _ party" is processed to obtain the tag word set of { actent, traffic, actent, response, party }.

Step 3.2, building a text feature extraction model:

similar to the model for extracting the entity semantic features, the model is composed of a single-layer multi-unit self-attention module, each layer has RH self-attention units, and since the relationship description contains few words and a small number of word ranges, the extraction model for extracting the relationship semantic features only contains 1 CH self-attention layer, that is, 1 self-attention layer is provided in the longitudinal direction, and each layer laterally contains RH self-attention units. Where each self-attention cell has a different parameter matrix. In the network model building, RH can be set by self and is generally selected from {1,2,3,4 }.

Step 3.3, the input of the model is the vector representation of each label word in a sequence, the calculation mode is consistent with the calculation mode in the step 2.3, firstly, the word vector of each word is initialized by using a word embedding tool word2vec to obtain the word vectorThe embedding dimension k is chosen from {50,70,80,100 }. Calculating a position vector of each tag word using formula (8) and formula (9)

And 3.4, calculating the matching degree between the label words by using a self-attention mechanism, multiplying the label words by the original word vector to obtain the vector after attention, wherein the calculation mode is consistent with the step 2.4.

And 3.5, after the characteristics of the CH attention layers are extracted, adding all output vectors of the sequence, and taking the sum as the entity semantic characteristics through a ReLU activation function. The calculation is in accordance with step 2.5.

Step 3.6, in order to synthesize the learned semantic features in different aspects, finally, RH different attention units are processed and mapped into a final entity semantic feature vector r_p. The calculation method is consistent with the step 2.6

And 4, constructing a mapping matrix by using the semantic feature vectors of the entities and the relations, and constructing a triple scoring function, namely an energy equation, based on the translation idea. The method specifically comprises the following steps:

step 4.1, for a triplet (h, r, t), setting a projection matrix M for the head entity and the tail entity respectively_rhAnd M_rtFor projecting entities from the entity space to the relationship space. The projection matrix is constructed by the respective semantic feature vectors of the entities and the relations obtained in the step 2 and the step 3, and the calculation mode is

Wherein B is^k×dIs a parameter matrix to be learned.

And 4.2, multiplying the head and tail entities with the respective projection matrixes to calculate the projection of the entities in the relation space, namely: h is_⊥＝M_rhh、t_⊥＝M_rtt。

Step 4.3, in the relation space, following the thought of the translation model, regarding the relation as the translation operation from the head entity to the tail entity, and constructing a triple scoring function (namely an energy equation) as

And 5, constructing a loss function based on intervals according to the triple scoring function, taking the knowledge-map triple as a training set, training a model by adopting a gradient descent optimization algorithm, and finally obtaining vector representation of the entity and the relation. The detailed steps are as follows:

and 5.1, defining an interval-based hinge loss function by taking all original triples T in the knowledge graph as a training set to train the model. The goal is to make the triplet scoring function get a lower score (energy) for positive-case triples and a higher score for negative-case triples. The loss function is

Wherein, T'_(h,r,t){ (h ', r, t) | h' ∈ E, h '≠ h } < { (h, r, t') | t '∈ E, t' ≠ t } is a negative case set constructed on the basis of the triplet (h, r, t). Interval value gamma>0 is a hyperparameter, which can be selected from {1,2,3,4 }.

Step 5.2, for any entity, force its vector's L2 norm to 1, i.e.Therefore, the entity embedding vector is regulated into a unit spherical surface, and invalid convergence of the target function can be prevented by artificially increasing the entity embedding specification.

And 5.3, in the training process, randomly traversing the fact triples (training sets) of the knowledge graph for multiple times, and constructing negative example triples for each fact triplet when the fact triplet is visited. The negative examples entity is not selected in the entity set, but a K-nearest neighbor method is adopted, the similarity of the entity to be replaced and other entities is calculated by utilizing a cosine similarity algorithm, the entities are ranked from high to low, and then top-K entities are taken as the negative examples candidate set of the entity to be replaced.

Step 5.4, optimizing the objective function by using small-batch Gradient declineThe learning rate mu is selected from {0.1,0.01,0.001}, and the batch size value B is selected from {200,500,1400,4800 }. After the small batch, the gradient is calculated and the model parameters are updated.

Compared with the prior art, the scheme based on the invention at least has the following advantages:

the translation models such as TransE, TransH, TransR, TransD and the like only model structural features inside the triples, and the defects are that other semantic features of entity relations in the knowledge graph are ignored. On the basis of TransE, other multi-source information embedding methods such as TEKE and DKRL embed the text description of the entity, so that the semantic characteristics of the description text are introduced into the entity, but the method still has the following defects: firstly, the used TransE can not satisfy many-to-many complex relations in the knowledge graph, and secondly, semantic features are only blended into the entity. The method and the device respectively embed the text corpora described by the entity and the relation description, use the text corpora for constructing the entity and constructing the projection matrix to the relation space, and finally perform expression learning in the relation space based on the translation thought. Not only the semantic features of the entities and the relations are considered at the same time, but also the structure embedding and the text embedding are skillfully combined.

TEKE processes entity description text based on co-occurrence network of words and entities, DKRL processes entity description text using continuous bag-of-words model or convolutional neural network, which belongs to more traditional mode in natural language processing, benefits from the advantage of attention mechanism in natural language processing.

arbitrarily choosing one from the entire set of entities to replace may result in the generation of very easily distinguishable negative example triples, such as for triples (Beijing, caliper of, China), generated by replacing the head entity (Water, caliper of, China) or replacing the tail entity (Beijing, caliper of, airplan), which are both significantly erroneous or even illogical. The Beijing and Water corresponding vectors are originally far apart in the same space, so that the negative example triplet contributes little to model learning. Instead, (Hong Kong, calital of China) is a close but wrong triplet. The K-nearest neighbor negative sampling method can improve the quality of negative example triples, so that the learning of a model is enhanced, and the final model can better distinguish correct triples from error triples.

Claims

1. A knowledge graph representation learning method based on attention mechanism and integrated with text semantic features is characterized by comprising the following steps:

2. The attention-based mechanism-fused-to-text semantic feature knowledge graph representation learning method of claim 1,

the method is characterized in that the step 2 specifically comprises the following steps:

step 2.1, obtaining an entity description text and preprocessing:

step 2.2, building a text feature extraction model:

step 2.3, obtaining vector representation x of each word as input of the model:

x＝l_word+l_pos (8)

e_p＝ReLU(W₂E+b) (13)

wherein Is a mapping matrix, E is a matrix composed of the outputs of RH different attention cells, b is a bias vector, W₂And the value of b is initialized with a normal distribution at the beginning of training.

3. The attention-based mechanism-fused-to-text semantic feature knowledge graph representation learning method of claim 1,

the method is characterized in that the step 3 specifically comprises the following steps:

step 3.1, preprocessing the relation name:

step 3.2, building a text feature extraction model:

4. The method for learning knowledge graph representation fused with text semantic features based on attention mechanism as claimed in any one of claims 1-3, wherein the step 4 specifically comprises:

step 4.1, for a ternarySets (h, r, t) for setting a projection matrix M for the head entity and the tail entity, respectively_rh and M_rtProjecting entities from an entity space to a relationship space; the projection matrix is constructed by the respective semantic feature vectors of the entities and the relations obtained in the step 2 and the step 3, and the calculation mode is

wherein B^k×dIs a parameter matrix to be learned;

5. the attention-based mechanism-fused-to-text semantic feature knowledge graph representation learning method of claim 4,

the method is characterized in that the step 5 specifically comprises the following steps:

wherein ,T′_(h，r，t){ (h ', r, t) | h' is E, h '≠ h }, { (h, r, t') | t 'is E, t' ≠ t } is a negative example set constructed on the basis of the triplet (h, r, t), the interval value γ > 0 being a hyperparameter;