CN108647258B

CN108647258B - Representation learning method based on entity relevance constraint

Info

Publication number: CN108647258B
Application number: CN201810377516.6A
Authority: CN
Inventors: 刘琼昕; 马敬; 龙航
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2018-01-24
Filing date: 2018-04-25
Publication date: 2020-12-22
Anticipated expiration: 2038-04-25
Also published as: CN108647258A

Abstract

The invention relates to an expression learning method based on entity relevance constraint, and belongs to the technical field of natural language processing and knowledge maps. According to the method, a strong association entity set and a weak association entity set of an entity are obtained by performing annotation and association division on a description text of the entity, the association is used as an auxiliary loss item and is fused into a translation-based representation learning method, and specifically, embedded representation of the entity and the relationship is obtained through sample negative sampling and model training, namely a head entity h and a tail entity t in a knowledge graph and a relationship r between the head entity and the tail entity are respectively embedded into a vector h, the vector t and the vector r. The method is superior to representation learning methods based on translation and text models in reasoning effect.

Description

Representation learning method based on entity relevance constraint

Technical Field

The invention relates to an expression learning method based on entity relevance constraint, and belongs to the technical field of natural language processing and knowledge maps.

Background

A Knowledge Graph (Knowledge Graph) is a Knowledge representation method based on a semantic network, provides an efficient and concise structural representation mode, and plays a key role in the fields of Web search and intelligent question answering. The knowledge graph represents real world data as entities and relations, knowledge is stored in (entities, relations, entities) triples, and the entities are connected through relations to form a network knowledge structure. Although a knowledge graph contains a large number of entities and relationships, the knowledge graph is not complete due to the fact that all of the knowledge graph cannot be extracted during the construction process of the knowledge graph. The knowledge graph reasoning technology can be applied to automatic completion of the graph, for example, the possible relation between two entities is predicted, and the quality evaluation can be carried out on the information extraction result by combining with the information extraction in the open field. The expression learning method is to embed the entities and relations in the knowledge graph into a low-dimensional space and finish the inference of the knowledge graph in the low-dimensional space.

Currently, the mainstream expression learning method regards the relationship as a translation process between entities, which is called a translation-based model, such as a trans model (border a, user N, Weston J, et al. translation expressions for modeling multi-translation data [ C ] in International Conference on Neural Information Processing systems, current associations inc.2013: 2787) which proposes that the relationship is a translation operation between a head entity and a tail entity. Therefore, some researchers propose a Representation Learning method combining an Entity description text and a graph structure, which belongs to a text Representation model, wherein a Representation DKRL model (Xie, r., Liu, z., Jia, j., Luan, h., & Sun, M. (2016, library). reproduction Learning of Knowledge Graphs with Entity descriptions.

Although the existing representation learning method has good effect on knowledge graph reasoning, semantic association between entities in a text is lack of mining, and a great space exists for improving reasoning performance. The invention aims to overcome the technical defect that semantic association information among entities is lost in the training process of the traditional expression learning method, and provides an expression learning method based on entity association constraint.

Disclosure of Invention

The invention aims to provide a representation learning method based on entity relevance constraint aiming at the problems that a translation-based model does not utilize rich semantic information in a text and the text representation model loses semantic relevance information among entities in the training process.

The core idea of the invention is as follows: mining out a relevance entity based on the entity description text, grading the relevance, and fusing the relevance serving as an auxiliary constraint into a translation-based representation learning method; the annotated entity description text is mainly used for obtaining the co-occurrence information between the entities, and the information is used as a standard for measuring the semantic association degree between the two entities, and the association degree is directional. The specific implementation is that a head entity h, a tail entity t and a relation r between the head entity and the tail entity in the knowledge graph are respectively embedded into a vector h, a vector t and a vector r.

The invention is realized by the following steps:

firstly, annotating and performing relevance division on a description text of an entity to obtain a strong relevance entity set and a weak relevance entity set of the entity; the method specifically comprises the following substeps:

step 1.1, annotating description texts of entities to obtain entity annotation results;

wherein the entity refers to an entity in the knowledge graph and is represented by e; e description text by Des_eThe expression is an ordered word set, and is expressed by formula (1):

Des_e＝＜w₁,...,w_m＞ (1)

wherein, w₁,...,w_mThe words are used, m is the number of words in the description text, the entity extracted from the description text is composed of more than or equal to 1 word, and when the entity is composed of more than or equal to two words, the extracted words need to be spliced;

the process of extracting entities from descriptive text is called descriptive text annotation; and (3) forming a set by the entities extracted from the description text to obtain an entity annotation result:

Des_e'＝＜w₁,...,w_m'＞ (2)

wherein m' is less than or equal to m, w_iRepresents an entity, Des_eIs' Des_eThe entity annotation result of (1);

step 1.2, relevance division;

obtaining the association degree value of the entity j to the entity i by using the ith entity and the jth entity in the entity annotation result output in the step 1.1 through a formula (3), and using W_ijRepresents:

if W is_ij2, a strongly associated Entity (Strong Relevant Entity) denoted j as i; if W is_ij1, recording j as a Weak associated Entity (Weak relevance Entity) of i, if two entities appear in the description of each other, the association becomes strong, and then obtaining a strong associated Entity set and a Weak associated Entity set of an Entity e;

degree of association value W_ijTraversing all entities in the entity annotation result to obtain an entity incidence matrix composed of incidence degree values, and recording the entity incidence matrix as

E is the set of entities in the knowledge-graph, | E | represents the total number of entities in the knowledge-graph:

wherein, the strongly associated entity set of the entity e is denoted as s (e):

wherein e is_iWhich represents the (i) th entity, the (ii) th entity,

representing entity e and entity e_iThe relationship between the two is strong association entity;

the weakly associated entity set of entity e is denoted as w (e):

secondly, carrying out sample negative sampling and model training to obtain an embedded expression of an entity and a relation;

representing entity e and entity e_iThe relationship between the two is weak association entity;

the model training is based on batch random gradient descent algorithm;

step two, the following substeps are specifically included:

step 2.1, initializing a cycle count value to be 1 and a cycle count maximum value;

wherein, the cycle count value is marked as k; the maximum cycle count, denoted iter;

step 2.2, enabling S to represent a triple set in the knowledge graph, wherein one triple in the knowledge graph is a positive sample, namely S is a positive sample set; randomly extracting B positive samples from S to obtain a subset S_batchLet us order

T_batchThe construction of (1) comprises the following substeps:

step 2.2.1, traverse S_batchEach positive sample (h, r, t) was sampled negatively as described in document 1(Feng J. knowledge Graph Embedding by transformation on Hyperplanes [ C ]]in aaai.2014): given the relation r, the average number tph of tail entities corresponding to each head entity in the sample negative sampling method in document 1 corresponds to tph of the patent_rThe average number hpt of head entities corresponding to each tail entity corresponds to hpt of the patent_r；

Produce a [0,1 ]]Uniformly distributed random number p of interval, if p is less than or equal to tph_r/(tph_r+hpt_r) Then equal probability is given from the set of entities E of the knowledge-graphAn entity is extracted to replace a head entity in the positive sample, and the replaced triple is ensured not to belong to S; if p is greater than tph_r/(tph_r+hpt_r) Extracting an entity from the entity set E of the knowledge graph with equal probability to replace the tail entity in the positive sample, and ensuring that the replaced triple does not belong to S;

step 2.2.2, after the replacement is completed, S can be obtained_batchA negative sample (h ', r, T') corresponding to each positive sample (h, r, T), adding each positive sample and negative sample to T_batchIn the set:

T_batch←T_batch∪{(h,r,t),(h',r,t')} (6)

after the steps 2.2.1 and 2.2.2, T is obtained_batchSet, extract T_batchThe entity set in (1), denoted as E_batch；

2.3, training the model based on a batch random gradient descent algorithm;

step 2.3.1, calculating the scores of the triplets (h, r, t) through the scoring function of the triplets (h, r, t) in the formula (7), and marking the scores as f_r(h,t)；

Wherein,

represents the square of the 2 norm of the h + r-t vector;

step 2.3.2, calculating a loss term L based on entity relevance through a formula (8)_r：

Wherein, alpha and beta are a strong correlation weight and a weak correlation weight, alpha determines the strength of the strong correlation constraint, and beta determines the strength of the weak correlation constraint; e represents E_batchThe entity of (1); on the left hand side of equation (8), e' represents the strongly associated set of entities for e, in equation(8) The right entry of (e), the weakly associated entity set of table e;

represents the square of the 2 norm of the vector e-e'; SC and WC represent strong correlation hyper-parameters and weak correlation hyper-parameters specified by a user, respectively represent distance limits between two associated entities, and when an entity pair is in a corresponding range, the loss is 0, and L is_rSuch that the distance of the associated entity pair in vector space does not exceed a certain range and does not monotonically minimize the associated entity pair distance;

and 2.3.3, calculating a loss function value of the model according to a formula (9):

wherein, Loss represents the Loss function value of the model; f. of_r(h, t) represents the score of a positive sample (h, r, t), f_r(h ', t') represents the scores of the negative samples (h ', r, t'), and the positive samples tend to be low in score and the negative samples tend to be high in score when training; gamma is the loss interval, gamma is used to control f_r(h, t) and f_r(h ', t');

step 2.3.4, calculating the derivative of the independent variable in the formula (9) and updating according to the formula (10);

wherein theta is an independent variable, all h, r and t are included, rate is a learning rate,

representing the differentiation of the Loss function value Loss of the model with respect to the argument θ;

step 2.3.5, judging whether the cycle count value k reaches the maximum value iter, and if k is iter, finishing the method; otherwise k is k +1, jumping to step 2.2;

so far, from step one to step two, an embedded representation of entities and relationships is obtained: and completing a representation learning method based on entity relevance constraint by using the vector h, the vector t and the vector r.

Advantageous effects

Compared with the prior art, the representation learning method based on entity relevance constraint has the following beneficial effects:

1. the traditional representation learning method is based on the structural information of the knowledge graph, and the description text information of the entity is not fully utilized, the invention provides a method for measuring the semantic association of the entity from the entity description text, a constraint item based on the entity association is constructed by using the method, the constraint item is fused into the traditional representation learning method, and the experimental result shows that the invention has better reasoning effect on the link prediction task and the triple classification task on the open data set compared with the traditional method, and the speed is consistent with that of the traditional method;

2. the expression learning method based on the text mostly vectorizes the text, omits semantic association among entities in the text, excavates entity association constraint items from the entities in the text, more carefully models the semantic association among the entities, and the experimental result shows that the expression learning method based on the text has better reasoning effect compared with a text expression model DKRL.

Drawings

Fig. 1 is a schematic flow diagram of a representation learning method based on entity association constraint and an embodiment 1 of the invention.

Detailed Description

The invention is further illustrated and described in detail below with reference to the figures and examples.

Example 1

This embodiment describes a specific implementation process of the representation learning method based on entity association constraint according to the present invention, and fig. 1 is an implementation flow diagram of this embodiment.

As can be seen from fig. 1, the specific implementation steps of the present invention and the embodiment are as follows:

a, annotating and performing relevance division on a description text of an entity to obtain a strong relevance entity set and a weak relevance entity set of the entity; the method specifically comprises the following substeps:

a.1, annotating description texts of entities to obtain entity annotation results;

Des_e＝＜w₁,...,w_m＞ (11)

Des_e'＝＜w₁,...,w_m'＞ (12)

step A.2, relevance division;

obtaining the association degree value of the entity j to the entity i by the ith entity and the jth entity in the entity annotation result output in the step A.1 through a formula (3), and using W_ijRepresents:

degree of association value W_ijIs directed, traverses entity annotationsAll entities in the result obtain an entity incidence matrix, which is marked as

the weakly associated entity set of entity e is denoted as w (e):

b, carrying out sample negative sampling and model training to obtain an embedded expression of an entity and a relation;

the model training is based on batch random gradient descent algorithm; step two, the following substeps are specifically included:

step B.1, initializing a cycle count value, wherein the cycle count value is recorded as k, and the initialization k is 1;

b.2, enabling S to represent a triple set in the knowledge graph, wherein one triple in the knowledge graph is a positive sample, namely S is a positive sample set; randomly extracting B positive samples from S to obtain a subset S_batchWherein B is 100, 1

T_batchThe construction of (1) comprises the following substeps:

step B.2.1, traverse S_batchEach positive sample (h, r, t) was sampled negatively as described in document 1(Feng J. knowledge Graph Embedding by transformation on Hyperplanes [ C ]]in aaai.2014): given the relation r, the average number tph of tail entities corresponding to each head entity in the sample negative sampling method in document 1 corresponds to tph of the patent_rAverage of each pair of tail entitiesThe number hpt of head entities corresponds to hpt of the patent_r；

Produce a [0,1 ]]Uniformly distributed random number p of interval, if p is less than or equal to tph_r/(tph_r+hpt_r) Extracting an entity from the entity set E of the knowledge graph with equal probability to replace the head entity in the positive sample, and ensuring that the replaced triple does not belong to S; if p is greater than tph_r/(tph_r+hpt_r) Extracting an entity from the entity set E of the knowledge graph with equal probability to replace the tail entity in the positive sample, and ensuring that the replaced triple does not belong to S;

step B.2.2, after the replacement is finished, S can be obtained_batchA negative sample (h ', r, T') corresponding to each positive sample (h, r, T), adding each positive sample and negative sample to T_batchIn the set:

T_batch←T_batch∪{(h,r,t),(h',r,t')} (16)

obtaining T after the step B.2.1 and the step B.2.2_batchSet, extract T_batchThe entity set in (1), denoted as E_batch；

B.3, training the model based on a batch random gradient descent algorithm;

and B.3.1, calculating the scores of the triples (h, r, t) through the scoring function of the triples (h, r, t) in the formula (7), and marking the scores as f_r(h,t)；

Wherein,

represents the square of the 2 norm of the h + r-t vector;

step B.3.2, calculating a loss term L based on entity relevance through a formula (8)_r：

Wherein α and β are a strong association weight and a weak association weight, SC and WC are a strong association range and a weak association range, respectively, where α ═ 1, β ═ 0.3, SC ═ 1, and WC ═ 1;

wherein gamma is a loss interval and is 1;

wherein, θ is an independent variable including all h, r and t, rate is a learning rate, and rate is 0.1;

step 2.3.5, judging whether the cycle count value k reaches the maximum value iter, wherein iter is 500, and if k is iter, completing the method; otherwise k is k +1, go to step 2.1.

Claims

1. A representation learning method based on entity relevance constraint is characterized in that: the core idea is as follows: mining out a relevance entity based on the entity description text, grading the relevance, and fusing the relevance serving as an auxiliary constraint into a translation-based representation learning method; obtaining co-occurrence information between entities by mainly utilizing the annotated entity description text, wherein the information is used as a standard for measuring semantic association degree between two entities, and the association degree is directional; the specific implementation is that a head entity h, a tail entity t and a relation r between the head entity and the tail entity in a knowledge graph are respectively embedded into a vector h, a vector t and a vector r; the method is realized by the following steps:

Des_e＝＜w₁,...,w_m＞ (1)

Des_e'＝＜w₁,...,w_m'＞ (2)

step 1.2, relevance division;

if W is_ij2, a strongly associated Entity (Strong Relevant Entity) denoted j as i; if W is_ijA weakly associated Entity (Weak relevance) denoted j by i, which becomes strongly associated if two entities appear to each other in the description of each otherThen, a strong associated entity set and a weak associated entity set of the entity e are obtained;

traversing all entities in the entity annotation result to obtain an entity association matrix consisting of association degree values, and marking as W e ∈^|E|×|E|E is the set of entities in the knowledge-graph, | E | represents the total number of entities in the knowledge-graph:

wherein e is_iWhich represents the (i) th entity, the (ii) th entity,

the weakly associated entity set of entity e is denoted as w (e):

representing entity e and entity e_iThe method specifically comprises the following substeps:

The construction of (1) comprises the following substeps:

step 2.2.1, traverse S_batch-negative sampling each positive sample (h, r, t);

2.3, training the model based on a batch random gradient descent algorithm;

Wherein,

representing 2 norms of a h + r-t vectorSquaring;

Wherein, alpha and beta are a strong correlation weight and a weak correlation weight, alpha determines the strength of the strong correlation constraint, and beta determines the strength of the weak correlation constraint; e represents E_batchThe entity of (1); in the left term of formula (8), e 'represents the strongly associated entity set of e, and in the right term of formula (8), e' represents the weakly associated entity set of e;

so far, from step one to step two, an embedded representation of entities and relationships is obtained: vector h, vector t, and vector r.