CN114003735A

CN114003735A - Knowledge graph question and answer oriented entity disambiguation method based on intelligence document

Info

Publication number: CN114003735A
Application number: CN202111595751.9A
Authority: CN
Inventors: 刘禹汐; 侯立旺; 姜青涛; 董勤娇; 王飞虎; 牟善强; 段龙海
Original assignee: Beijing Daoda Tianji Technology Co ltd
Current assignee: Beijing Daoda Tianji Technology Co ltd
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-02-01
Anticipated expiration: 2041-12-24
Also published as: CN114003735B

Abstract

The invention relates to an entity disambiguation method facing knowledge graph question answering based on information documents, which comprises the following steps: generating a plurality of candidate entities corresponding to the entities through entity linkage; a plurality of candidate entities are used as training data, a ranking model is built through RankNet based on a neural network, and the ranking model is optimized by using a gradient descent method; training the constructed ranking model by combining a BP network back propagation algorithm and a conjugate gradient algorithm, and selecting a candidate entity most related to the entity by using the trained ranking model so as to disambiguate other candidate entities. The core innovation point of the invention is that the entity link task is converted into the information retrieval problem, the candidate entity is identified and generated by a conventional rule dictionary method, and then the index is linked to the optimal possible candidate entity by adopting an improved LTR (learning to rank) method, so that the entity disambiguation task is converted into the LTR problem in the information retrieval, and the candidate entity is disambiguated by utilizing a sequencing model.

Description

Knowledge graph question and answer oriented entity disambiguation method based on intelligence document

Technical Field

The invention relates to the technical field of document entity link, in particular to an entity disambiguation method facing knowledge map question answering based on intelligence documents.

Background

The entity link technology is a key technology in a KBQA (Knowledge-based Question Answering) Knowledge Question-Answering system, has important research significance and practical value in various fields of Knowledge organization, information retrieval, semantic publishing and the like, and can be widely applied to Knowledge base expansion, machine translation, automatic Question Answering and the like of an information system. The entity link facing the knowledge base question and answer, namely, the entity in the natural language question sentence is mapped to the entity corresponding to the knowledge map, the entity is endowed with real and definite meanings, and the important basis for establishing the link is the matching degree of the text context and the entity in the specific knowledge base. The entity link mainly solves the problems of one-word polysemous and multi-word polysemous of the entity in the natural language stationery, and helps to understand the specific meaning of the natural language question. Entity links are therefore an important way to connect natural arguments to the knowledge base, and are also a necessary condition for understanding natural language question sentences.

Currently, when using the technique of entity linking, the entity linking is divided into two subtasks, respectively candidate entity generation and candidate entity disambiguation. Candidate entity generation is a prerequisite, and the same reference may correspond to several entities in the knowledge base. The purpose of candidate entity disambiguation is to find one candidate entity from the set of candidate entities that best fits the context of the sentence as the target entity. The final result can thus be improved from both aspects, the current main method of candidate entity generation is by means of constructing a dictionary of mappings of the designations to the candidate entities, i.e. aliases, which include acronyms, fuzzy matches, nicknames, misspellings, etc. The main methods for disambiguating the target of the candidate entity comprise a method for directly calculating the similarity between the designated context and the description text of the candidate entity, a method based on a graph model, a method based on a probability model, a method based on a subject model and the like.

The main difficulty of the entity link research is entity ambiguity, and the application of the existing entity link technology to the knowledge-graph question-answering mainly has the following disadvantages:

1) the lack of the context is referred, and the sentence facing the knowledge base question and answer only contains a few words and cannot provide sufficient context to assist entity disambiguation;

2) most short texts only have one reference, so that an entity joint disambiguation method cannot be used;

3) the structured knowledge base lacks text description information of the entity, and the entity in the knowledge base is difficult to represent.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides an entity linking method facing knowledge-graph question answering based on an intelligence document.

In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:

an entity disambiguation method facing knowledge graph question answering based on intelligence documents comprises the following steps:

step S1: generating a plurality of candidate entities corresponding to the entities through entity linkage;

step S2: a plurality of candidate entities are used as training data, a ranking model is built through RankNet based on a neural network, and the ranking model is optimized by using a gradient descent method;

step S3: training the constructed ranking model by combining a BP network back propagation algorithm and a conjugate gradient algorithm, and selecting a candidate entity most related to the entity by using the trained ranking model so as to disambiguate other candidate entities.

The step of generating a plurality of candidate entities corresponding to the entity through the entity link includes:

identifying entity name boundaries based on the set spelling rule, word construction rule, indicator word and prefix and suffix string definition template; and recognizing entity names by utilizing the entries in the existing dictionary so as to generate a plurality of candidate entities corresponding to the entities.

The step of constructing a ranking model based on a neural network by using the candidate entities as training data through RankNet comprises the following steps:

after manual marking is respectively carried out on the candidate entities, an idealized scoring function g is obtained;

computing every two candidate entities x using neural networks^u、x^vModel probability P of^u,vAnd according to the model probability P^u,vConstructing target probabilities

According to the target probability

And the model probability P^u,vCalculating cross entropy, the calculated cross entropy being defined as a loss function

Thereby obtaining a loss function

。

The step of obtaining an idealized scoring function g after the candidate entities are respectively manually labeled comprises: respectively carrying out manual marking on the candidate entities to obtain score comparison of every two candidate entities, and when each two candidate entities are compared, assigning the candidate entity with higher score as +1, otherwise assigning the candidate entity with lower score as-1, thereby forming a sequencing order of a plurality of candidate entities; and obtaining an idealized scoring function g according to the sorting order of the candidate entities.

The computing of every two candidate entities x using a neural network^u、x^vModel probability P of^u,vAnd according to the model probability P^u,vConstructing target probabilities

According to the target probability

Thereby obtaining a loss function

The method comprises the following steps:

given two candidate entities x^u、x^vIn training data

The above uses a neural network to calculate the score, s represents the predicted result of the scoring function f:

computing candidate entities x based on a scoring function f^uAnd x^vDefining a model probability P^u,vThis is a sigmoid function representing candidate entity x^uRank in candidate entity x^vThe preceding probabilities, namely:

constructing target probabilities based on true annotations

：

In the formula (I), the compound is shown in the specification,

is {1, -1}, if the candidate entity u is more relevant to the searched entity than the candidate entity v, the value is 1, and if not relevant, the value is-1;

defining a loss function as a target probability on a candidate entity

And the model probability P^u,vCross entropy between, i.e.:

and (3) pushing out:

wherein the content of the first and second substances,

is a loss function.

The method for constructing the ranking model based on the neural network through RankNet and optimizing the ranking model by using a gradient descent method comprises the following steps of:

using gradient descent method as optimization algorithm to learn scoring function f, passing through loss function

Calculating loss, and updating the weight w of the neural network based on a gradient descent method;

loss function

And (5) deriving the weight w to obtain:

wherein the partial derivative of the score s with respect to the weight w is related to a specific learning process.

The step of training the constructed ranking model by combining the BP network back propagation algorithm and the conjugate gradient algorithm comprises the following steps:

obtaining the output error E of the sequencing model by using a BP network:

wherein

，

Is a function of the hidden layer of the BP network,

，

is a function of the output layer of the BP network,

、

the weights of the output layer and the hidden layer are respectively,

for the expected output vector, k represents the kth neuron, j represents the jth hidden layer, and i represents the ith output layer;

suppose an output layerThe initial weight of (1) is w (0), the initial weight of the hidden layer is v (0), and the initial search direction d⁽⁰⁾Is a negative gradient direction h⁽⁰⁾Namely:

the adjustment amount of the weight is in direct proportion to the negative gradient direction of the error, namely:

wherein j =0,1,2, ·, m; k =1,2,. l; i =0,1,2, ·, n; wherein the sign of d is negative, representing a decreasing gradient, constant

The scale factor is expressed, and the learning rate is reflected in the training;

the iterative learning rate step length of the conjugate gradient algorithm is as follows:

in the formula, Q is a symmetrical real matrix of n x n,

the direction d is conjugated with respect to Q.

Compared with the prior art, the invention has the beneficial effects that:

the core innovation point of the invention is that the entity link task is converted into the information retrieval problem, the candidate entity is identified and generated by a conventional rule dictionary method, and then the index is linked to the optimal possible candidate entity by adopting an improved LTR (learning to rank) method, so that the entity disambiguation task is converted into the LTR problem in the information retrieval, and the candidate entity is disambiguated by utilizing a sequencing model.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a flow chart of a method for entity linking according to the present invention;

FIG. 2 is a schematic diagram of a PairWise sorting method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a conjugate direction vector according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Also, in the description of the present invention, the terms "first", "second", and the like are used for distinguishing between descriptions and not necessarily for describing a relative importance or implying any actual relationship or order between such entities or operations.

Example (b):

the invention is realized by the following technical scheme, as shown in figure 1, an entity disambiguation method facing to knowledge map question answering based on intelligence documents, and a technology facing to entity linking in knowledge base question answering, wherein the key point is an entity disambiguation part. The innovation of the scheme is that an entity link task is converted into an information retrieval problem, a candidate entity is identified and generated through a conventional rule dictionary method, then the designation is linked to the most possible candidate entity through an improved LTR (learning to rank) method, so that an entity disambiguation task is converted into the LTR problem in the information retrieval, and the candidate entity is disambiguated through a sequencing model.

The method comprises the following steps:

step S1: and generating a plurality of candidate entities corresponding to the entities through entity linkage.

The generation of the candidate entity is realized by adopting a conventional method and mainly by using a rule and a dictionary method. After word segmentation and part-of-speech tagging are carried out on a text, an entity name boundary is identified by a rule-based method according to a set spelling rule, a word construction rule, an indicator word and a prefix-suffix character string definition template, then an entity name is identified by utilizing entries in an existing dictionary, and a candidate entity is generated.

The expression forms of the target entities in the open source information have diversity, including alias, short name, nickname and the like, and according to statistics, each named entity has 3.3 different expression forms on average, in order to solve the problem of expression form diversity, the scheme can obtain all candidate entities corresponding to the entities after entity links such as Wikipedia (Chinese edition), interactive encyclopedia, Baidu encyclopedia and the like, and further generalize and summarize the different expression forms corresponding to the entities to construct a synonym table. For example, the name of the entity named "zhang san" is linked by the entity, and the corresponding different expression forms include a man swimmer, a college professor, a manager of an enterprise department, etc., and the different expression forms are a plurality of candidate entities corresponding to the name of the entity named "zhang san", and the plurality of candidate entities form a candidate entity set.

Step S2: and a plurality of candidate entities are used as training data, a ranking model is constructed through RankNet based on a neural network, and the ranking model is optimized by using a gradient descent method.

In the step, the entity link and the feature text constructed by the candidate entity set are selected and predicted, for example, the searched Zhang III is selected to be most relevant to which candidate entity set in the candidate entity set, and other candidate entities are disambiguated.

The scheme converts an entity disambiguation task into an LTR problem in information retrieval, and adopts an improved PairWise method, wherein the method mainly shifts to a candidate entity sequence relation, the input of the method is a candidate entity, and the output of the method is a local priority relation in the candidate entity. The ordering problem is mainly ascribed to binary classification problems such as Boost, SVM, neural networks, and the like.

In detail, please refer to fig. 2, assuming that a certain entity corresponds to three candidate entities, after the three candidate entities corresponding to the entity are labeled manually, d₁The score of =5 is highest, followed by d₂=3, worst is d₃=2, this is converted into a relative relationship followed by:

d₂>d₁、d₂>d₃、d₃>d₁

for any two different labeled candidate entities, a training instance (d) can be obtained_i,d_j) If d is_i>d_jThen to d_iAssigned a value of +1, whereas the pair d_jThe value is-1, so that the training samples required by the training of the binary classifier can be obtained, and the sorting order of the correlation can also be obtained according to the reverse order relation. Since the score is manually labeled, it can be regarded as a standard answer, which is equivalent to imagine that there is an optimal scoring function g, and the next task is to construct another scoring function f whose scoring result can be as identical as possible to the scoring function g. During testing, only the scoring function f is used for classifying all candidate entities, and then a ranking relation of all candidate entities can be obtained, so that the ranking relation is realizedAnd (6) sorting.

The Ranking problem of the candidate entities is converted into the sequence judgment through PairWise, the PairWise has a plurality of implementation modes such as a Ranking SVM, a Ranking net, a Frank, a Ranking boost and the like, and the scheme improves a Ranking learning method of the neural network, namely the Ranking net. RankNet is one of representatives of a neural network-based ranking learning method, which depends on each candidate entity and defines a probability-based loss function, and the scheme adopts a neural network and a gradient descent method to try to minimize a cross entropy loss function.

Two candidate entities x given an association^u、x^vIn training data

constructing target probabilities based on true annotations

：

In the formula (I), the compound is shown in the specification,

is {1, -1}, if the candidate entity u is more relevant to the searched entity than the candidate entity v, the value is 1, and if not relevant, the value is-1.

Defining a loss function as a target probability on a candidate entity

And the model probability P^u,vCross entropy between, i.e.:

and (3) pushing out:

the ultimate goal is to optimize

The sum of (a) and (b) to minimize the loss function. The neural network is used for modeling, and a gradient descent method is used as an optimization algorithm to learn the scoring function f. Passing loss function

And calculating loss, and updating the weight w of the neural network based on a gradient descent method. The loss function C is derived from the weight w to obtain:

in the above formula, the partial derivative of the score s with respect to the weight w is related to a specific learning process, and the original RankNet method uses a neural network model, where a gradient descent method is used to find a ranking model in RankNet. The cross entropy is used as a loss function, so that derivation is convenient, and the method is suitable for a gradient descending framework.

According to the scheme, the training mode of the sequencing model is improved and optimized, the weight and the threshold value of a BP network back propagation algorithm (hereinafter referred to as BP network) are adjusted according to the negative gradient direction of a network performance function, the search directions of adjacent iterations in the calculation method are orthogonal, vibration is easy to occur when the search directions are close to an extreme value, and even though the BP network can reduce the network performance function value at the fastest speed, the BP network is easy to fall into a local minimum point, which is a place with defects of the BP learning algorithm.

Obtaining the output error E of the sequencing model by using a BP network:

wherein

，

Is a function of the hidden layer of the BP network,

，

is a function of the output layer of the BP network,

、

the weights of the output layer and the hidden layer are respectively,

for the desired output vector, k denotes the kth neuron, j denotes the jth layer hidden layer, and i denotes the ith layer output layer. It can be seen that adjusting the weights can change the output error E.

The principle of adjusting the weight is to reduce the error continuously, and in order to better train the BP network, a conjugate gradient algorithm is combined with the training of the BP network, please refer to fig. 3, and the main algorithm is described as follows:

assuming that the initial weight of the output layer is w (0), the initial weight of the hidden layer is v (0), and the initial search direction d⁽⁰⁾Is a negative gradient direction h⁽⁰⁾Namely:

wherein j =0,1,2, ·, m; k =1,2,. l; i =0,1, 2. Wherein the sign of d is negative, representing a decreasing gradient, constant

The scale factor is expressed, reflecting the learning rate in the training.

in the formula, Q is a symmetrical real matrix of n x n,

the direction d is conjugated with respect to Q.

The first search direction is a negative gradient direction, then iteration is searched along a conjugate direction to obtain a minimum point, the conjugate direction is continuously generated along with the iteration, in each iteration, a new direction is constructed by utilizing the linear combination of the last search direction between the gradient vectors of the current iteration point, the new direction and the previously generated search direction form the conjugate direction, and the iteration is searched along the conjugate direction to obtain the minimum point.

The above is a quadratic function problem, and for a non-quadratic function problem, the objective function is subjected to second-order approximation by a taylor expansion formula, and a selection coefficient is calculated by adopting a Fletcher-Reeves formula. In the implementation process of the algorithm, a pure steepest descent step is performed every n steps and is used as an interpolation step, and the global convergence of the algorithm can be ensured because other steps do not increase the objective function. By interpolating steps, it is meant that some algorithm calculates one step of other algorithms, and the only requirement of the interpolating steps is that they do not increase the value of the decreasing function, so as to ensure the convergence of the composite process, i.e.:

by adopting the improved conjugate gradient method, the overall convergence is ensured, the convergence speed of the algorithm is considered, and the defect that BP network learning is easy to fall into a local minimum point is overcome.

Thus, the trained ranking model can rank the generated candidate entities, convert the entity disambiguation task into the LTR problem in information retrieval, and utilize the ranking model to disambiguate the candidate entities.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An entity disambiguation method facing knowledge graph question answering based on intelligence documents is characterized in that: the method comprises the following steps:

2. The intelligence document-based knowledge-graph question-answer oriented entity disambiguation method of claim 1, characterized in that: the step of generating a plurality of candidate entities corresponding to the entity through the entity link includes:

3. The intelligence document-based knowledge-graph question-answer oriented entity disambiguation method of claim 1, characterized in that: the step of constructing a ranking model based on a neural network by using the candidate entities as training data through RankNet comprises the following steps:

According to the target probability

Thereby obtaining a loss function

。

4. The intelligence document-based knowledge-graph question-answer oriented entity disambiguation method of claim 3, characterized in that: the step of obtaining an idealized scoring function g after the candidate entities are respectively manually labeled comprises: respectively carrying out manual marking on the candidate entities, comparing scores of every two candidate entities, and assigning a value of +1 to the candidate entity with a higher score when the scores of every two candidate entities are compared, otherwise assigning a value of-1 to the candidate entity with a lower score, thereby forming a sequencing order of the candidate entities; and obtaining an idealized scoring function g according to the sorting order of the candidate entities.

5. The intelligence document-based knowledge-graph question-answer oriented entity disambiguation method of claim 3, characterized in that: computing every two candidate entities x using neural networks^u、x^vModel probability P of^u,vAnd according to the model probability P^u,vConstructing target probabilities

According to the target probability

Thereby obtaining a loss function

The method comprises the following steps:

given two candidate entities x^u、x^vIn training data

constructing target probabilities based on true annotations

：

In the formula (I), the compound is shown in the specification,

defining a loss function as a target probability on a candidate entity

And the model probability P^u,vCross entropy between, i.e.:

and (3) pushing out:

wherein the content of the first and second substances,

is a loss function.

6. The intelligence document-based knowledge-graph question-answer oriented entity disambiguation method of claim 5, characterized in that: the method for constructing the ranking model based on the neural network through RankNet and optimizing the ranking model by using a gradient descent method comprises the following steps of:

loss function

And (5) deriving the weight w to obtain:

7. The intelligence document-based knowledge-graph question-answer oriented entity disambiguation method of claim 6, characterized in that: the step of training the constructed ranking model by combining the BP network back propagation algorithm and the conjugate gradient algorithm comprises the following steps:

obtaining the output error E of the sequencing model by using a BP network: