CN110334219B

CN110334219B - Knowledge graph representation learning method based on attention mechanism integrated with text semantic features

Info

Publication number: CN110334219B
Application number: CN201910629813.XA
Authority: CN
Inventors: 惠孛; 罗光春; 张栗粽; 卢国明; 李攀成
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-07-12
Filing date: 2019-07-12
Publication date: 2023-05-09
Anticipated expiration: 2039-07-12
Also published as: CN110334219A

Abstract

The invention relates to a knowledge graph, discloses a knowledge graph representation learning method based on a attention mechanism and integrating text semantic features, and solves the problems that semantic features are insufficient due to the fact that a translation model fails to utilize descriptive texts of entities and relations, the multi-source information embedding method fails to integrate the semantic features for the entities and the relations at the same time, and the text extraction effect is poor. The method can be summarized as follows: firstly, acquiring and processing description texts of entities and relations to obtain text semantic features, then constructing a projection matrix of the entities by utilizing the semantic features of the entities and the relations, projecting entity vectors into a relation space, modeling and carrying out representation learning in the relation space by utilizing the translation thought, and thus modeling the many-to-many complex relation. The method is suitable for the representation learning of the knowledge graph.

Description

Knowledge graph representation learning method based on attention mechanism integrated with text semantic features

Technical Field

The invention relates to a knowledge graph, in particular to a knowledge graph representation learning method based on a attention mechanism integrated with text semantic features.

Background

With the development of internet technology, data has shown explosive growth. However, because of heterogeneous content on the internet, the organization structure is loose, and it is difficult to efficiently use the information therein, google proposes the concept of a Knowledge Graph (knowledgegraph) in 2012, and aims to convert massive unstructured or semi-structured data into structured Knowledge which is unified and reliable in specification, thereby forming a semantic web with high internet, and providing support for data mining and intelligent services.

Knowledge graph can be regarded as a network of directed graph structures, wherein graph nodes represent entities or concepts, and edges in the graph represent relationships between entities and entities or between entities and concepts. Knowledge, i.e., (subject, predicate, object) or (entity, relationship, entity), is typically described in the form of triples. Knowledge-graph representation learning (Knowledge Graph Representation Learning) aims to learn a vectorized representation of entity relationships, converting knowledge in symbolic form into computable real-valued vectors.

In the conventional technology, a plurality of schemes for learning knowledge graph representation based on translation models exist:

mikolov et al found that there was a shift-invariant phenomenon in the word vector space using word2vec, such as v (king) -v (queen) ≡v (man) -v (woman), where v (king) represents the vector of word king obtained using word2 vec. Inspired by this phenomenon, bordes et al propose translation (translation) operations in which the relationships in the knowledge-graph are treated as head-entity to tail-entity translations in the embedding space by the TransE model: if a triplet (h, r, t) exists or holds, then in the embedding space the head entity vector plus the relation vector should be as close as possible to the tail entity vector, i.e. h+r≡t. Which defines a scoring function of

The TransE model is simple and effective, has expansibility on a large-scale knowledge graph, but has serious defects. The relationship in the knowledge graph can be divided into four types of 1-1, 1-N, N-1 and N-N according to the number of entities connected at two ends of the knowledge graph, and the TransE model determines that the knowledge graph is effective only for the relationship of 1-1, and has great problems for other relationship types, such as under the N-1 relationship,

(h _i r, T) ∈T, meaning h ₀ ＝h ₁ ＝…＝h _m This is obviously unreasonable.

Aiming at the defect of the TransE on the complex relation, the head and tail entity vectors are respectively projected to the relation plane by the TransH model and then the translation operation is carried out, so that the entities can have different representations under different relations. W for TransH _r 、d _r The two vectors represent the relationship r, where w _r Is the normal vector of the relation hyperplane, d _r For translation operation corresponding to the relation, first, the head entity vector and the tail entity vector are respectively projected to a relation plane, then the translation operation is carried out, and the corresponding scoring function is as follows:

both TransE and TransH assume that entities and relationships are in the same semantic space, while relationships and entities are different objects, and TransR models entities and relationships in different spaces. For a triplet (h, r, t), entity embedding

Relational embedding->

For each relation r a projection matrix is set +.>

For projecting entities from the entity space to the relationship space. Similarly, its scoring function becomes:

/>

TransD proposes a method of dynamically changing matrices to address multiple semantic representations of relationships. It defines two representations for each entity or relationship, one (h, r, t) representing its own semantics and the other (h _p ，r _p ，t _p ) Representing the way in which the entity vector is projected into the relationship vector space, and the second representation will be used to construct the mapping matrix:

the projected entity relation vector and scoring function can be obtained after the mapping matrix is provided:

h _⊥ ＝M _rh h，t _⊥ ＝M _rt t (6)

it can be seen that the models of TransD and TransE, transH are essentially modeling only the structural features within a triplet, i.e., a "translation", while ignoring other semantic features of entities and relationships.

Some multi-source information embedding methods in the conventional technology introduce more semantic features for entity and relation representation by embedding text corpus:

the DKRL takes the text of the entity as the entity description, and proposes a knowledge representation learning method integrated with the entity description information. Each entity has two representations: structure-based representation e _s And description-based representation e _d The triplet score consists of two parts: e=e _S +E _D . The representation of the structure uses the TransE model: e (E) _S ＝||h _s +r-t _s To relate the learning process of the description-based representation to E _S Adaptation, E _D And is further divided into E _DD ＝||h _d +r-t _d ||、E _DS ＝||h _d +r-t _s ||、E _SD ＝||h _s +r-t _d Three parts. Based on the representation of the description obtained by processing the text of the entity description, the author designed two ways of CBOW (Continuous Bag of Words) encoders and convolutional neural network encoders to extract semantic features of the entity description. It can be seen that DKRL uses the transition model when combining entity description information, however, the transition cannot model many-to-many relationships. In addition, the DKRL method only introduces descriptive information for the entity, and does not consider semantic features of the relationship.

TEKE is also a representation learning method that uses text to enhance entity relationship semantics: given a knowledge graph KG and a text corpus expressed as a word sequence, TEKE firstly marks words in a language library by using an entity linking tool to obtain a marking sequence D= (x) of the entities in the corresponding knowledge graph ₁ ，x ₂ ，…x _n )。In order to combine the knowledge graph KG with the text information D, the author constructs a co-occurrence network g= (X, Y) consisting of entities and words, where X _i Representing nodes of the network, corresponding to a word or an entity, y _ij Represents x _i and x_j Co-occurrence frequency between them. Based on the co-occurrence network, a set of tagged words whose co-occurrence frequency exceeds a given threshold is selected as the semantic context of the corresponding entity and a vector representation thereof is constructed. The text processing mode of the TEKE construction co-occurrence network is traditional, the operation is complex, and semantic information among words in the sequence is not fully utilized.

In summary, the translation-based model essentially models only the structural features inside the triples, and does not use the descriptive text of the entities and relationships, so that other semantic features of the entity relationships in the knowledge graph are ignored. Under the circumstance, the sparsity of the knowledge graph leads to insufficient learning of the entity relation vector, often only roughly meets the translation characteristic, has low quality, leads to difficult distinction of some entities with the same relation but different meanings, and brings negative influence to the accuracy of the tasks such as subsequent knowledge fusion, knowledge graph completion and the like.

The multi-source information embedding methods such as DKRL, TEKE and the like expand the semantics of the entity by embedding entity description text corpus, but the methods have the following defects: firstly, DKRL is subjected to structure embedding by using a TransE method, so that the complex relation of many-to-many in a knowledge graph cannot be satisfied; secondly, DKRL only embeds the entity text description, so that semantic features are integrated for the entity, but the semantic features of the relation are not considered; thirdly, when the DKRL and the TEKE process entity text descriptions, a convolutional neural network and a co-occurrence network of words are respectively used, and interaction among the words in the sequence is not considered.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: the knowledge graph representation learning method based on the attention mechanism and integrating the text semantic features is provided, and the problems that the semantic features are insufficient due to the fact that a translation model fails to utilize the descriptive text of the entity and the relation, the semantic features cannot be integrated for the entity and the relation by the multi-source information embedding method at the same time, and the text extraction effect is poor are solved.

The technical scheme adopted for solving the technical problems is as follows:

the knowledge graph representation learning method based on the attention mechanism integrated with the text semantic features comprises the following steps:

step 1, defining two representations for each entity and relation in a knowledge graph, wherein the two representations comprise semantic feature vector representations of the entity and the relation in the knowledge graph;

step 2, aiming at each entity in the knowledge graph, acquiring sentences containing the entity from a corpus, preprocessing, and extracting semantic features of the sentences by adopting a self-attention mechanism to obtain text semantic feature vectors of the entity;

step 3, aiming at each relation in the knowledge graph, word segmentation is carried out on the name description of the relation to obtain a tag word set, and semantic features of the tag word set are extracted by adopting a self-attention mechanism to obtain text semantic feature vectors of the relation;

step 4, constructing a mapping matrix based on text semantic feature vectors of the entities and the relations, and constructing a triplet scoring function based on the thought of the translation model;

and 5, constructing a loss function based on the interval according to the triplet scoring function, taking the knowledge-graph triplet as a training set, training a model by adopting a gradient descent optimization algorithm, and finally obtaining vector representation of the entity and the relation.

As a further optimization, step 2 specifically includes:

step 2.1, acquiring entity description text and preprocessing:

for each entity e in the knowledge graph, acquiring at least one sentence containing the entity from a corpus as a description text of the entity, using a word segmentation tool to segment each sentence, and then removing stop words to obtain a preprocessed word sequence;

step 2.2, constructing a text feature extraction model:

constructing a network model consisting of multi-layer multi-unit self-attention modules, wherein the model is formed by stacking 3 identical layers in total, namely 3 identical layers exist in the longitudinal direction, each layer transversely comprises RH self-attention units for processing input so as to learn the characteristics of word sequences from different aspects, and each self-attention unit has different parameter matrixes; RH can be set in a self-defined manner;

step 2.3, obtaining vector representation x of each word as input of a model:

the vector representation of each word is represented by its word vector

And position vector->

And composition of (a) in a calculation manner of

x＝l _word +l _pos (8)

Initializing word vectors by using a word embedding tool word2 vec; each word is calculated to have a position code determined by its position order pos in the entity description text sequence in such a way that the value of the ith dimension of the position vector is calculated as

Step 2.4, calculating the influence degree of each word and all other words in the sequence by using a self-attention mechanism so as to obtain the attention distribution of the self-attention to the other words, namely the weight value:

the influence degree between words is calculated by using multiplicative attention, and then multiplied by the original word vector to obtain the vector after attention

The calculation formula is that

Where n is the number of words in the sequence,

is a matrix of vectors of all words in the sequence,

as a parameter matrix, W ₁ Initializing the values of (2) by adopting normal distribution at the beginning of training;

step 2.5, after feature extraction of 3 attention layers, adding all output vectors of the sequence, and taking the added output vectors as entity semantic features through a ReLU activation function, wherein a calculation formula is as follows:

step 2.6, processing RH different attention units, mapping the RH different attention units into a final entity semantic feature vector, and calculating the final entity semantic feature vector according to the following formula:

e _p ＝ReLU(W ₂ E+b) (13)

wherein

Is a mapping matrix, E is a matrix of RH outputs of different attention units, b is a bias vector, W ₂ And the value of b is initialized with a normal distribution at the beginning of training.

As a further optimization, step 3 specifically includes:

step 3.1, preprocessing the relation name:

for each relation r in the knowledge graph, word segmentation is carried out on the names of the relations r by using a word segmentation tool to obtain a tag word sequence;

step 3.2, constructing a text feature extraction model:

constructing a network model consisting of a single-layer multi-unit self-attention module, wherein the model is provided with 1 self-attention layer in the longitudinal direction, and the layer transversely comprises RH self-attention units for processing input to learn the characteristics of word sequences from different aspects, and each self-attention unit is provided with a different parameter matrix; RH can be set in a self-defined manner;

step 3.3, obtaining vector representation of each tag word in the tag word sequence as input of a model;

step 3.4, calculating the matching degree between the tag words by using a self-attention mechanism, and multiplying the matching degree with the original word vector to obtain an attention vector;

step 3.5, after feature extraction of the attention layer, adding all output vectors of the sequence and using a ReLU activation function as a relation semantic feature;

step 3.6, processing RH different attention units, mapping the RH different attention units into a relation semantic feature vector r _p 。

As a further optimization, step 4 specifically includes:

step 4.1, for a triplet (h, r, t), setting a projection matrix M for the head entity and the tail entity, respectively _rh and M_rt For projecting an entity from an entity space to a relationship space; the projection matrix is constructed by semantic feature vectors of the entities and the relations obtained in the step 2 and the step 3, and the calculation mode is that

wherein B^k×d Is a parameter matrix to be learned;

step 4.2, multiplying the head entity, the tail entity and the respective projection matrix to calculate the projection of the entity in the relation space, namely: h is a _⊥ ＝M _rh h、t _⊥ ＝M _rt t；

And 4.3, in a relation space, following the thought of a translation model, regarding the relation as translation operation from a head entity to a tail entity, and constructing a triplet scoring function as follows:

as a further optimization, step 5 specifically includes:

step 5.1, taking all original triples T in the knowledge graph as training sets, defining an interval-based hinge loss function to train a model, wherein the goal is to enable a triplet scoring function to obtain a lower score for a positive case triplet and a higher score for a negative case triplet, and the loss function is as follows:

wherein ,T′_(h，r，t) = { (h ', r, t) |h' ∈e, h 'noteqh }) u { (h, r, t')|t '∈e, t' noteqt } is a negative example set constructed on the basis of triples (h, r, t), and interval value γ > 0 is a super parameter;

step 5.2, for any entity, forcing its vector L2 norm to be 1, i.e

Thereby regularizing the entity embedded vector into a unit sphere;

in the training process, the fact triples of the knowledge graph are randomly traversed for a plurality of times, and when each fact triplet is accessed, a negative example triplet is constructed for the fact triplet, and the selection mode of a negative example entity is as follows: adopting a K-neighbor method, firstly calculating the similarity of an entity to be replaced and other entities by using a cosine similarity algorithm, sequencing from high to low, and then taking top-K entities as negative example candidate sets of the entity to be replaced;

step 5.4 optimizing the objective function Using a small batch gradient descent algorithm

Gradients are then calculated and model parameters are updated.

The beneficial effects of the invention are as follows:

(1) The invention integrates structural features and text semantic features for the entities and the relations at the same time:

the invention respectively embeds the text corpus of the entity description and the relation description, and uses the text corpus to construct a projection matrix of the entity to the relation space, and finally performs representation learning based on translation ideas in the relation space, thereby not only simultaneously considering semantic features of the entity and the relation, but also skillfully combining structure embedding and text embedding.

(2) Compared with other multisource information embedding methods, the method can extract richer semantic features:

the multi-layer self-attention method adopted by the invention processes entity description and relation description, and can efficiently extract semantic features with higher quality by virtue of the advantage of attention mechanism on natural language processing.

(3) The negative sampling method of the K-neighbor can enable the model to show better distinguishing capability:

the K-neighbor negative sampling method can improve the quality of the negative case triples, thereby enhancing the learning of the model and enabling the final model to better distinguish correct triples from error triples.

Drawings

FIG. 1 is a schematic diagram of a knowledge graph representation learning method based on attention mechanism incorporated text semantic features;

FIG. 2 is a flow chart of a learning method of knowledge graph representation based on attention mechanism merging text semantic features;

fig. 3 is a schematic diagram of text feature extraction based on an attention mechanism.

Detailed Description

The invention aims to provide a knowledge graph representation learning method based on the integration of a attention mechanism into text semantic features, which solves the problems that the semantic features are insufficient due to the fact that a translation model fails to utilize descriptive texts of entities and relations, the multi-source information embedding method fails to integrate the semantic features into the entities and the relations at the same time, and the text extraction effect is poor.

The knowledge graph representation learning method combines text embedding and translation ideas. Firstly, acquiring and processing description texts of entities and relations to obtain text semantic features, then constructing a projection matrix of the entities by utilizing the semantic features of the entities and the relations, projecting entity vectors into a relation space, modeling and carrying out representation learning in the relation space by utilizing the translation thought, and thus modeling the many-to-many complex relation. The implementation principle is shown in fig. 1, and in the relation space, the relation is regarded as a translation operation from a head entity to a tail entity according to the idea of a translation model.

The knowledge graph representation learning method of the invention is shown in fig. 2, and comprises the following implementation steps:

step 1, defining two types of representation for each entity e in the knowledge graph, wherein one type of representation is semantic features of the entity itself and is represented as e. The other is the text semantic feature of the entity, denoted as e _p . The two representations are also defined for each relationship r in the knowledge-graph.

Step 2, for each entity e in the knowledge graph, acquiring sentences containing the entity from a corpus, preprocessing, and extracting semantic features of the sentences by adopting a self-attention mechanism to obtain text semantic feature vectors e of the entities _p 。

Step 3, for each relation r in the knowledge graph, word segmentation is carried out on the name description of each relation r to obtain a tag word set, and semantic features of the tag word set are extracted by adopting a self-attention mechanism to obtain semantic feature vectors r of the relations _p 。

And 4, constructing a mapping matrix by utilizing semantic feature vectors of the entities and the relations, and constructing a triplet scoring function, namely an energy equation, based on the translation thought.

In the implementation, the required original data are a triplet set of the knowledge graph and a corpus-text set of the same language as the knowledge graph. The specific implementation means of each step are further described below:

in step 1, all entities and relations of the knowledge graph are firstly obtained, two vectors of the entities and the relations are initialized by using tensorflow, and the dimension values of the entities and the relations are respectively super-parameters d and k and can be selected from {50, 70, 80, 100 }. The semantic feature vectors of entities and relationships themselves use boundaries as

Is initialized for the uniform distribution of the memory. Text semantic feature e of entities and relationships _p and r_p Not randomly initialized, but calculated by the steps 2 and 3.

Step 2, using the description text of the entity as input, extracting the semantic features of sentences by adopting a self-attention mechanism, and outputting a vector e _p . The method comprises the following specific steps:

step 2.1, preprocessing entity description text:

for each entity e in the knowledge graph, at least one sentence containing the entity is obtained from the corpus as a description text of the entity, a word segmentation tool is used for segmenting each sentence, and then the stop words are removed to obtain a preprocessed word sequence.

Step 2.2, constructing a text feature extraction model:

the basic processing unit of feature extraction is to apply a self-attention mechanism to the sequence, the model consists of a multi-layer, multi-unit self-attention module, each layer having RH self-attention units, processing the input to learn the features of the sequence from different aspects. The model is stacked with a total of ch=3 identical layers, i.e. 3 identical layers in the longitudinal direction, each layer containing RH self-attention units in the lateral direction, as shown in fig. 3. Wherein each self-attention unit has a different parameter matrix. In the network model building, RH can be set in a self-defined manner, and is generally selected from {1,2,3,4 }.

Step 2.3, input of the model isThe vector for each word represents x. The vector representation of each word is represented by its word vector

And position vector->

And composition of (a) in a calculation manner of

x＝l _word +l _pos (8)

The word vector is initialized with the word embedding tool word2 vec. Each word is calculated to have a position code determined by its position order pos in the entity description text sequence in such a way that the value of the ith dimension of the position vector is calculated as

And 2.4, calculating the influence degree of each word and all other words in the sequence by using a self-attention mechanism to obtain the attention allocation (namely weight value) of the self-attention to the other words, wherein the weight value determines how much each word is expressed in the belonged position. The influence degree between words is calculated by using multiplicative attention, and then multiplied by the original word vector to obtain the vector after attention

The calculation formula is that

Where n is the number of words in the sequence.

Is a matrix of vectors of all words in the sequence.

As a parameter matrix, W ₁ The values of (2) may be initialized with a normal distribution at the beginning of training. Divided by->

The purpose of (a) is to scale the weight value to prevent its value from becoming too large.

Step 2.5, after feature extraction of CH attention layers, adding all output vectors of the sequence, and taking the result as entity semantic features through a ReLU activation function, wherein a calculation formula is as follows

And 2.6, processing RH different attention units and mapping the RH different attention units into a final entity semantic feature vector in order to integrate semantic features learned in different aspects. The calculation mode is that

e _p ＝ReLU(W ₂ E+b) (13)

wherein

Is a mapping matrix, E is a matrix of outputs of RH different attention cells, and b is a bias vector. W (W) ₂ And the value of b may be initialized with a normal distribution at the beginning of training.

Step 3, using the name tag words of the relation as input, extracting semantic features of the tag word set by adopting a self-attention mechanism, and outputting a vector r _p . The method specifically comprises the following steps:

step 3.1, preprocessing the relation names: and for each relation r in the knowledge graph, word segmentation is carried out on the names of the relations r by using a word segmentation tool, so as to obtain a tag word sequence. The tag word set of { accident, traffic, accident, response, part } is obtained after the relationship name "/accident/traffic_accident/res-active_part" is processed.

Step 3.2, constructing a text feature extraction model:

similar to the model of entity semantic feature extraction, the model is composed of a single-layer multi-unit self-attention module, each layer has RH self-attention units, and as the relation description contains fewer words and fewer word ranges, the extraction model of the relation semantic features only contains CH=1 self-attention layers, namely 1 self-attention layer in the longitudinal direction, and each layer transversely contains RH self-attention units. Wherein each self-attention unit has a different parameter matrix. In the network model building, RH can be set in a self-defined manner, and is generally selected from {1,2,3,4 }.

Step 3.3, the input of the model is the vector representation of each tag word in a sequence, and the calculation mode is consistent with that of step 2.3, firstly, the word vector of each word is initialized by using word2vec to obtain the word vector of each word

The embedding dimension k is selected from {50, 70, 80, 100 }. Calculating a position vector of each tag word using equation (8) and equation (9)

And 3.4, calculating the matching degree between the tag words by using a self-attention mechanism, and multiplying the matching degree with the original word vector to obtain an attention vector, wherein the calculation mode is consistent with that of the step 2.4.

And 3.5, after feature extraction of CH attention layers, adding all output vectors of the sequence and using a ReLU activation function as a relation semantic feature. The calculation mode is consistent with the step 2.5.

Step 3.6, in order to integrate the semantic features learned in different aspects, at last, the RH different attention units are processed and mapped into a final relational semantic feature vector r _p . The calculation mode is consistent with the step 2.6

And 4, constructing a mapping matrix by utilizing semantic feature vectors of the entities and the relations, and constructing a triplet scoring function, namely an energy equation, based on the translation thought. The method specifically comprises the following steps:

step 4.1, for a triplet (h, r, t), setting a projection matrix M for the head entity and the tail entity, respectively _rh and M_rt For projecting an entity from an entity space to a relationship space. The projection matrix is constructed by semantic feature vectors of the entities and the relations obtained in the step 2 and the step 3, and the calculation mode is that

wherein B^k×d Is a parameter matrix to be learned.

Step 4.2, multiplying the head and tail entities and respective projection matrixes to calculate projections of the entities in a relation space, namely: h is a _⊥ ＝M _rh h、t _⊥ ＝M _rt t。

Step 4.3, constructing a triplet scoring function (i.e. an energy equation) as follows the idea of a translation model in a relation space, regarding the relation as a translation operation from a head entity to a tail entity

And 5, constructing a loss function based on the interval according to the triplet scoring function, taking the knowledge-graph triplet as a training set, training a model by adopting a gradient descent optimization algorithm, and finally obtaining vector representation of the entity and the relation. The method comprises the following detailed steps:

and 5.1, defining an interval-based hinge loss function by taking all original triples T in the knowledge graph as a training set so as to train a model. The goal is to have the triplet scoring function get a lower score (energy) for positive case triples and a higher score for negative case triples. The loss function is

wherein ,T′_(h，r，t) = { (h ', r, t) |h' ∈e, h '+.h }) u { (h, r, t')|t '∈e, t' +.t } is a negative example set constructed on the basis of triples (h, r, t). The interval value gamma > 0 is a super parameter and can be selected from {1,2,3,4 }.

Step 5.2, for any entity, forcing its vector L2 norm to be 1, i.e

Therefore, the entity embedding vector is regularized into a unit sphere, and invalid convergence of the objective function can be prevented by artificially increasing the entity embedding specification. />

In the training process, the fact triples (training sets) of the knowledge graph are randomly traversed a plurality of times, and when each fact triplet is accessed, a negative example triplet is constructed for the fact triplet. The negative example entity is not selected in the entity set, but a K-neighbor method is adopted, the similarity between the entity to be replaced and other entities is calculated by utilizing a cosine similarity algorithm, the entities are ordered from high to low, and top-K entities are taken as negative example candidate sets of the entity to be replaced.

Step 5.4 optimization of the objective function Using a small batch gradient descent (Mini-batch Gradient Descent)

The learning rate μ is selected from {0.1,0.01,0.001}, and the batch size value B is selected from {200, 500, 1400, 4800 }. After a small batch, gradients are calculated and model parameters are updated.

Based on the above scheme of the invention, compared with the traditional technology, the invention has at least the following advantages:

the translation models TransE, transH, transR, transD and the like are all structural features only modeling the interior of the triples, and have the disadvantage of ignoring other semantic features of entity relations in the knowledge graph. The text description of the entity is embedded by other multisource information embedding methods such as TEKE, DKRL and the like on the basis of the TransE, so that semantic features for describing the text are introduced for the entity, and the following disadvantages still exist: firstly, the used TransE cannot meet the complex relation of many-to-many in the knowledge graph, and secondly, semantic features are only integrated for the entity. The invention respectively embeds the text corpus of the entity description and the relation description, and uses the text corpus to construct a projection matrix of the entity to the relation space, and finally performs representation learning based on translation ideas in the relation space. The semantic features of the entity and the relation are considered at the same time, and the structure embedding and the text embedding are combined skillfully.

the entity description text is processed by the TEKE based on the co-occurrence network of the word and the entity, the entity description text is processed by the DKRL by using a continuous word bag model or a convolutional neural network, the modes belong to more traditional modes on natural language processing, and the advantage of an attention mechanism on natural language processing is benefited.

choosing one arbitrarily from the whole entity set for replacement may result in the generation of very easily distinguishable negative triples, such as for triples (beijin, caps of, china), by replacing head entities (Water, caps of, china) or replacing tail entities (beijin, caps of, airland), which are all obviously erroneous or even not logical. The Beijing and Water correspondence vectors are inherently far apart in the same space, so that the negative case triples have little learning contribution to the model. In contrast, (Hong Kong, caps of, china) is a close but erroneous triplet. The K-neighbor negative sampling method can improve the quality of the negative case triples, thereby enhancing the learning of the model and enabling the final model to better distinguish correct triples from error triples.

Claims

1. The knowledge graph representation learning method based on the attention mechanism integrated with the text semantic features is characterized by comprising the following steps of:

step 4, constructing a mapping matrix based on text semantic feature vectors of the entities and the relations, and constructing a triplet scoring function based on the thought of the translation model; the step 4 specifically comprises the following steps:

step 4.1, for a triplet (h, r, t), setting a projection matrix M for the head entity and the tail entity, respectively _rh and M_rt For projecting an entity from an entity space to a relationship space; the projection matrix is constructed by semantic feature vectors of the entities and the relations obtained in the step 2 and the step 3, and the calculation mode is as follows:

wherein ,B^k×d Is a parameter matrix to be learned; h is a _p A semantic feature vector representing the head entity h,

is h _p Is a transpose of (2); t is t _p Semantic feature vector representing tail entity t, +.>

At t _p Is a transpose of (2); r is (r) _p A semantic feature vector representing a relationship r;

2. The knowledge graph representation learning method based on the attention mechanism integrated with text semantic features according to claim 1,

the method is characterized in that the step 2 specifically comprises the following steps:

step 2.1, acquiring entity description text and preprocessing:

step 2.2, constructing a text feature extraction model:

constructing a network model consisting of multi-layer multi-unit self-attention modules, wherein the model is formed by stacking 3 identical layers in total, namely 3 identical layers exist in the longitudinal direction, each layer transversely comprises RII self-attention units for processing input so as to learn the characteristics of word sequences from different aspects, and each self-attention unit has different parameter matrixes; RII can be set in a self-defined manner;

step 2.3, obtaining vector representation x of each word as input of a model:

the vector representation of each word is represented by its word vector