CN111897944A

CN111897944A - Knowledge map question-answering system based on semantic space sharing

Info

Publication number: CN111897944A
Application number: CN202010827800.6A
Authority: CN
Inventors: 朱聪慧; 徐冰; 杨沐昀; 曹海龙; 赵铁军
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2020-08-17
Filing date: 2020-08-17
Publication date: 2020-11-06
Anticipated expiration: 2040-08-17
Also published as: CN111897944B

Abstract

A knowledge-graph question-answering system based on semantic space sharing belongs to the technical field of Chinese knowledge-graph question-answering. The invention solves the problem that the accuracy of the obtained answer entity is limited due to insufficient information sharing among modules in the conventional knowledge-graph question-answering system. The invention jointly trains the BERT pre-training language model by utilizing the training data of the question main entity recognition submodule, the entity link submodule and the relation prediction submodule, and realizes the information sharing of the semantic space by embedding the jointly trained model into each submodule. The method can ensure that the question main entity identification submodule can only identify one main entity from the natural language question, and can effectively improve the accuracy of the obtained answer entity through semantic information sharing among the submodules. Experiments prove that the accuracy of the answer entity obtained by the method can reach 86.64%. The invention can be applied to knowledge-graph question answering.

Description

Knowledge map question-answering system based on semantic space sharing

Technical Field

The invention belongs to the technical field of Chinese knowledge map question-answering, and particularly relates to a knowledge map question-answering system based on semantic space sharing.

Background

The knowledge-graph question-answering technology is a special automatic question-answering technology, and aims to automatically give correct answers when a natural language question is given by taking a knowledge graph as a knowledge source. The technology provides a natural and direct method for accessing massive knowledge map information. There are many different knowledge-graph question-answer models, which can be divided into two main categories.

The first class of models is based on semantic analysis. The model converts a natural language question into a structural logic representation such as SPARQL by carrying out detailed semantic analysis on the natural language question, and executes corresponding query on a knowledge graph to directly obtain an answer. The traditional semantic analysis method relies on a manually labeled logic expression as supervision information, so that the method is limited to a few relational predicates which can be covered by labeled data. The pattern matching can be used for expanding the application range of the semantic analysis model and realizing the same purpose by utilizing external resources to carry out weak supervision learning.

The second class of models is based on information retrieval techniques. The model firstly obtains all possible candidate triples (head entities, relations and tail entities) from the knowledge graph, and then sorts the candidate triples through a machine learning or deep learning method. And the triple with the highest rank is the prediction result. The method generally does not need manual design rules or characteristics, so that the method has better generalization capability and is more suitable for being applied to large-scale knowledge maps.

Pre-training a language model: the pre-training language model is a model which is used for pre-training on large-scale linguistic data to obtain general language representation and improve the performance of other natural language processing tasks. With the development of deep learning technology, various deep neural networks are successfully applied in the field of natural language processing, but compared with some large-scale deep neural networks in the field of computer vision, the deep learning model used in the field of natural language processing is relatively smaller in scale. An important reason for this is that tasks in the field of natural language processing generally do not have extensive training data. The academia is concerned with how to introduce more additional a priori knowledge in the field of natural language processing through larger scale pre-training.

Along with the increase of computing power and the appearance of more complex network structures brought by new hardware such as TPU (thermoplastic polyurethanes), a series of deep pre-training language models appear in the academic world. The models are the same in that a neural network model which is complex enough is pre-trained on a large-scale unmarked corpus through designing a special training target to obtain a general language representation with excellent performance, and the language representation is used for helping to improve the performance of other natural language processing tasks through a specific migration learning technology, but the model structure, the pre-training target and the migration mode are different. BERT is the one with the greatest influence in these models.

BERT is an abbreviation for Bidirectional Encoder replication from transform, a pre-trained language model that achieves outstanding results in the NLP domain, which trains deep bi-directional Representation by jointly adjusting bi-directional transformers in all layers.

There are mainly two ways to apply the pre-trained language representation to downstream tasks before BERT is proposed, one is feature-based, e.g., ELMo, which introduces the pre-trained representation as a feature into the network of downstream tasks; the other is fine-tuning, such as OpenAI GPT, which trains a complete neural language model in a pre-training phase, adds task-oriented parameters on the basis of the model in the fine-tuning phase, and fine-tunes all parameters of the whole network. The language representation obtained by the feature-based method is fixed, and the fine-tuning-based method can optimize the parameters of the whole pre-training language model to improve the performance of the pre-training language model on a target task, so that the model has stronger expression capability. Academic research also indicates that the second method performs better. BERT considers that the two models have a common limitation, and their architectures are unidirectional. ELMo, while using bi-directional LSTM, in practice it splices a left-to-right representation with a right-to-left representation, which is considered a shallow junction rather than a true deep bi-directional representation. The OpenAI GPT adopts a transform decoder structure, and the current word can only see information on the left side of the word, so the OpenAI GPT is also a unidirectional architecture. BERT uses two training tasks, masked LM and NSP, to enable the model to learn a deep bi-directional linguistic representation. Wherein the masked LM uses context information to predict the current word that is masked out, enabling the model to obtain bi-directional information; the NSP task predicts the sentence pair sequence, so that the model can better understand the sentence pair concept and pay attention to the semantic relationship. The BERT model refreshes the performance of eleven natural language processing tasks, and is a milestone-like task.

A wide variety of pre-trained language models have been proposed by the academia. Although these models have advantages in terms of technical solutions, the core contribution of these pre-trained language models is to learn a vector representation with excellent performance for natural language from large-scale unlabeled text through a very complex neural network. The complexity of the neural network model provides powerful support for the expression capability of the pre-training language model, for example, the parameter quantity of the T5 model with the best performance of the GLUE reaches 110 hundred million at present; the ultra-large scale training data provides massive natural language information for the model, and achieves sufficient coverage on common language phenomena, for example, the T5 model uses 750GB training data. Compared with the prior method, such as word vector or supervised neural network, the pre-training language model can greatly increase the corpus scale which can be effectively utilized, and the model expression capability is improved, thereby obtaining performance improvement on various natural language processing tasks.

In summary, although the prior art has achieved a certain achievement in the field of knowledge-graph question-answering technology, the accuracy of the obtained answer entity is limited due to insufficient semantic information sharing among modules in the prior knowledge-graph question-answering system.

Disclosure of Invention

The invention aims to solve the problem that the accuracy of an obtained answer entity is limited due to insufficient information sharing among modules in the conventional knowledge-graph question-answering system, and provides a knowledge-graph question-answering system based on semantic space sharing.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a knowledge-graph question-answering system based on semantic space sharing comprises a question main entity identification submodule, an entity linking submodule and a relation prediction submodule, wherein:

the question main entity recognition sub-module, the entity link sub-module and the relation prediction sub-module are internally embedded with a BERT pre-training language model; the BERT pre-training language model is obtained through the joint training of three sub-modules;

the question main entity identification submodule is used for coding the input natural language question, respectively obtaining the vector representation of each character in the natural language question, and determining the starting position and the ending position of a main entity according to the vector representation of each character to obtain the main entity in the input natural language question;

the entity link submodule is used for predicting the entity name of the main entity in the knowledge graph in the input natural language question;

the relation prediction submodule is used for predicting the relation name of a relation predicate in the input natural language question in the knowledge graph;

in the knowledge graph, a tail entity connected with the predicted entity name through the predicted relation predicate is an answer entity.

The invention has the beneficial effects that: the invention provides a knowledge-graph question-answering system based on semantic space sharing. The method can ensure that the question main entity identification submodule can only identify one main entity from the natural language question, and can effectively improve the accuracy of the obtained answer entity through semantic information sharing among the submodules. Experiments prove that the accuracy of the answer entity obtained by the method can reach 86.64%.

Drawings

FIG. 1 is a schematic diagram of a knowledge-graph question-answering system based on semantic space sharing according to the present invention.

Detailed Description

First embodiment this embodiment will be described with reference to fig. 1. The system for knowledge-graph question-answering based on semantic space sharing described in this embodiment includes a question main entity recognition sub-module, an entity linking sub-module and a relationship prediction sub-module, in which:

Performing combined training: multi-task joint training refers to the use of training data and corresponding training objectives for multiple tasks to optimize the same larger scale shared model. By jointly optimizing a plurality of training targets, the multi-task joint training avoids overfitting of a specific task, and the generalization capability of the model on each task participating in training can be improved. The method is widely applied to multiple fields of artificial intelligence technology, such as natural language processing, computer vision, voice recognition and the like.

There are two main ways to implement the deep learning multitask joint training. The first approach shares the same model structure and most of the model parameters among all tasks. The hidden layer of the neural network is shared among different tasks, and each task can reserve a certain exclusive output layer. Sharing a hidden layer among multiple training tasks can greatly reduce the degree of neural network overfitting. Intuitively, the multi-task shared hidden layer needs to learn a generic sample representation that can be applied to all tasks, and therefore does not over-fit the training data of a certain task.

The second joint training mode is to maintain a group of neural network parameters for each task separately, but by adding a certain regular term, the difference between the parameters of any two tasks is not too large. The similarity of parameters between different tasks may be maintained using the L2 rule, traces of a matrix, etc.

The deep learning model in the natural language processing field can also be jointly trained according to the method. On the basis, the academics also provide a unique method according to the characteristics of natural language processing tasks. For example, different tasks in the natural language processing field have obvious hierarchical relationships, tasks such as part-of-speech tagging, named entity recognition and the like are generally used in the data preprocessing stage of natural language processing, and only a model is required to perform shallow analysis on the natural language; tasks such as text implication, machine translation, reading understanding, etc. are generally considered to require deeper understanding of the model to natural language text. Thus, in some cases, shallow tasks should rely on the shallow hidden outputs of the neural network model, while tasks that are more demanding on semantic understanding should rely on the deep hidden outputs of the neural network model.

In order to realize high-performance and wide-coverage semantic computation, the invention applies the pre-training language model as a basic semantic computation technology, and invents the effect of sharing information on a plurality of subtasks of the knowledge-graph question-answer by the combined fine tuning technology of the pre-training language model to improve the knowledge-graph question-answer.

The second embodiment, which is different from the first embodiment, is: the BERT pre-training language model is obtained through the joint training of three sub-modules, and the training data of the question main entity recognition sub-module is a natural language question and a main entity in the natural language question; the training data of the entity link sub-module is natural language question sentences and entity names of correct and wrong main entities in the knowledge graph, the entity names of the correct main entities in the knowledge graph are used as positive samples, and the entity names of the wrong main entities in the knowledge graph are used as negative samples; the training data of the relation prediction submodule is the relation names of natural language question sentences and correct and wrong relation predicates in the knowledge graph, the relation names of the correct relation predicates in the knowledge graph are used as positive samples, and the relation names of the wrong relation predicates in the knowledge graph are used as negative samples.

The third embodiment is different from the second embodiment in that: the BERT pre-training language models in the question main entity recognition sub-module, the entity link sub-module and the relation prediction sub-module share the BERT network parameters and do not share the output layer parameters.

Output layer parameters of the embedded BERT pre-training language model in the question main entity recognition submodule are obtained by training through training data of the question main entity recognition submodule, output layer parameters of the embedded BERT pre-training language model in the entity link submodule are obtained by training through training data of the entity link submodule, and output layer parameters of the embedded BERT pre-training language model in the relation prediction submodule are obtained by training through training data of the relation prediction submodule.

The BERT network parameters are obtained by joint training of three parts of training data. And constructing three parts of training data by using the natural language question and the manually marked answer triplet.

The fourth embodiment is different from the first embodiment in that: the BERT pre-training language model is trained using a cross-entropy loss function.

Fifth embodiment, the difference between the first embodiment and the second embodiment is: the question main entity identification submodule encodes the natural language question by utilizing a BERT pre-training language model, respectively obtains a vector of each character in the natural language question, and calculates the probability of each character serving as a starting character and an ending character of a main entity according to the obtained vectors;

wherein, c_iRepresenting the i-th character, p, of a natural language question c_s(c_i) Representing the probability of the ith character as the starting character of the main entity, e representing the base of the natural logarithm,

a vector representation representing the ith character,

representing the vector representation of the k-th character, w_sIs a start position discrimination vector for scoring the likelihood that each character is the start position of the main entity, k representing the kth character in the natural language question c, k being 0,1, …, L-1, L representing the total number of characters in the natural language question c;

p_e(c_i) Representing the probability of the ith character as the main entity end character, w_eAn end position discrimination vector for scoring the likelihood that each character is the end position of an entity;

and selecting the character with the maximum probability as the starting character of the main entity in the natural language question as the starting character of the main entity, and selecting the character with the maximum probability as the ending position from the characters behind the starting character of the main entity of the natural language question as the ending character of the main entity of the question.

Vector w_sAnd w_eThe inner product with each character vector in the question serves as the start and end score for each character. The score is normalized by the softmax function, resulting in a probability that each character is the beginning and ending position of the dominant entity. The sum of the starting probability and the ending probability of each character in the question predicted by the method is 1, and a legal probability distribution is formed.

Sixth embodiment, the difference between this embodiment and the first embodiment is: the entity link submodule is used for predicting the entity name of the main entity in the knowledge graph in the input natural language question, and the prediction method comprises the following steps:

and predicting the correct probability of the text pair formed by the input natural language question and all the candidate entity names by using an entity link submodule, and selecting the candidate entity name with the highest correct probability as the predicted entity name.

The candidate entity names are obtained by using a pre-constructed entity link table, which contains all candidate entity names.

Seventh embodiment, the difference between this embodiment and the first embodiment is: the relation prediction submodule is used for predicting the relation name of a relation predicate in an input natural language question in a knowledge graph, and the prediction method comprises the following steps:

and predicting the correct probability of the text pair formed by the input natural language question and each candidate relation predicate by using a relation prediction submodule, and selecting the relation predicate with the highest correct probability as the predicted relation predicate.

The candidate relation predicates are obtained by using all relation predicates of the predicted entity in the knowledge graph as the candidate predicates.

The present invention was trained and tested using the NLPCC-ICCPOL 2016 public data set. The data set contains a large-scale open domain knowledge graph and training and testing of the composition of question-answer pairs. The knowledge-graph scale statistics are shown in table 1 below.

TABLE 1 NLPCC-ICCPOL 2016 knowledgegraph Scale

The question-answer pair comprises three parts, namely a question original sentence, a related triple and a question answer, and the scales of the question original sentence, the related triple and the question answer are shown in the following table 2. The invention tested the performance of each sub-module and the whole on this data set, as shown in table 3 below.

TABLE 2 NLPCC-ICCPOL 2016 question-answer pair Scale

TABLE 3 respective modules and overall experimental results (%)

The above-described calculation examples of the present invention are merely to explain the calculation model and the calculation flow of the present invention in detail, and are not intended to limit the embodiments of the present invention. It will be apparent to those skilled in the art that other variations and modifications of the present invention can be made based on the above description, and it is not intended to be exhaustive or to limit the invention to the precise form disclosed, and all such modifications and variations are possible and contemplated as falling within the scope of the invention.

Claims

1. The knowledge-graph question-answering system based on semantic space sharing is characterized by comprising a question main entity identification submodule, an entity linking submodule and a relation prediction submodule, wherein:

2. The knowledge-graph question-answering system based on semantic space sharing of claim 1, wherein the BERT pre-trained language model is obtained by joint training of three sub-modules, and the training data of the question main entity recognition sub-module is a natural language question and a main entity in the natural language question; the training data of the entity link sub-module is natural language question sentences and entity names of correct and wrong main entities in the knowledge graph, the entity names of the correct main entities in the knowledge graph are used as positive samples, and the entity names of the wrong main entities in the knowledge graph are used as negative samples; the training data of the relation prediction submodule is the relation names of natural language question sentences and correct and wrong relation predicates in the knowledge graph, the relation names of the correct relation predicates in the knowledge graph are used as positive samples, and the relation names of the wrong relation predicates in the knowledge graph are used as negative samples.

3. The knowledge-graph question-answering system based on semantic space sharing of claim 2, wherein the BERT pre-training language models in the question main entity recognition sub-module, the entity linking sub-module and the relationship prediction sub-module share BERT network parameters and do not share output layer parameters.

4. The semantic space sharing based knowledge-graph question answering system according to claim 1, wherein the BERT pre-trained language model is trained using a cross-entropy loss function.

5. The knowledge-graph question-answering system based on semantic space sharing according to claim 1, wherein the question main entity recognition sub-module encodes a natural language question by using a BERT pre-training language model, respectively obtains a vector of each character in the natural language question, and calculates the probability that each character is used as a start character and an end character of a main entity according to the obtained vectors;

representing the vector representation of the ith character, w_sIs a starting position discrimination vector, k represents the kth character in the natural language question c,

a vector representation representing the kth character, k being 0,1, …, L-1, L representing the total number of characters in the natural language question c;

p_e(c_i) Representing the probability of the ith character as the main entity end character, w_eIs the end position discrimination vector;

6. The knowledge-graph question-answering system based on semantic space sharing according to claim 1, wherein the entity link sub-module is used for predicting the entity name of the main entity in the knowledge-graph in the input natural language question, and the prediction method is as follows:

7. The knowledge-graph question-answering system based on semantic space sharing according to claim 1, wherein the relation prediction submodule is used for predicting the relation name of a relation predicate in an input natural language question in a knowledge graph, and the prediction method is as follows: