CN115759254A

CN115759254A - Question-answering method, system and medium based on knowledge-enhanced generative language model

Info

Publication number: CN115759254A
Application number: CN202211421056.5A
Authority: CN
Inventors: 林瑞玉; 陈官正; 梁上松
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-11-14
Filing date: 2022-11-14
Publication date: 2023-03-07

Abstract

The invention relates to the technical field of natural language processing, in particular to a question answering method, a question answering system and a question answering medium based on a knowledge enhanced generative language model. According to the method, the knowledge graph is divided according to the relation, the knowledge graph submodels are trained according to the relation, so that the generating language model can learn more relation semantics in the knowledge graph, and the question answering capability of the generating language model is improved. Meanwhile, in the process of predicting the entity by training the mask modeling method, the method combines the characteristics of the Bart model, utilizes contrast learning to construct a loss function, solves the problems of large entity prediction space and large prediction difficulty of a plurality of words, and improves the training efficiency of the knowledge enhanced generative language model.

Description

Question-answering method, system and medium based on knowledge enhanced generative language model

Technical Field

The invention relates to the technical field of natural language processing, in particular to a question answering method, a question answering system and a question answering medium based on a knowledge enhanced generative language model.

Background

Pre-training has become an important approach to natural language processing today. Pre-training is training from large-scale text corpora through self-supervised learning to obtain semantic representations of words in a specific context. The Bart language model is one of representative pre-training language models, is a denoising self-encoder for sequence-sequence model pre-training, and achieves good effects on text generation, language understanding, abstract dialogue, question answering and summary tasks; the network structure is composed of a bidirectional encoder and a left-to-right autoregressive decoder. The selection of the pre-trained Bart universal language model can further save training time and computing resources, and better accuracy of the generative question-answering system can be achieved more quickly. The Bart For Conditional generation model adds a language modeling network at the last layer of the Bart language model, and can be used For text generation tasks.

Although existing pre-trained models have learned a large amount of useful linguistic knowledge from text corpora, they often do not capture the factual knowledge well and do not do knowledge-intensive tasks requiring external knowledge, such as question-answering, entity linking, etc. These knowledge often behave sparsely and formally complex in the text. Therefore, the introduction of a knowledge graph containing a large amount of structured factual knowledge to enhance the language pre-training model is an important method for solving the knowledge-intensive task. It becomes an effective solution to this problem in the prior art. A knowledge-graph is composed of a series of triples, each of which includes (head entity, relationship, tail entity). At present, most of work is to introduce knowledge related to a knowledge map into a pre-training model through knowledge embedding and mask language modeling target combined learning. However, the following problems still exist in the prior art:

(1) Most knowledge injection methods directly embed the words of the relationship as input, and ignore the problem that the same relationship may have different expressions in the text. For example, when nationality is expressed, the knowledge graph expresses person. National association, but also has a plurality of expression forms with the same meaning, such as "white is.

(2) Entity prediction is a method most commonly used when training knowledge embedding, such as constructing a triple (head entity word embedding + relation word embedding + MASK [ MASK ]), predicting which tail entity the masked part [ MASK ] is, and is a multi-classification task. However, when the knowledge graph is large in size and the number of entities is too large, prediction space and cost are increased during classification.

(3) When entity prediction is pre-trained, the name of the tail entity may be multiple words (tokens). In the mask language modeling, if a plurality of tokens are masked, for the traditional left-to-right language model, the generation of a plurality of token predictions is simple, and the next token can be decoded by using the previous token as a priori autoregressive. However, many pre-trained models, such as BERT, are encodings that predict a single token given a left-right context, and decoding multiple tokens from such encodings remains a difficult problem. In addition, the masked places are usually predicted by a fully connected neural network and then calculating the classification loss by a softmax function, however, the parameters of the fully connected neural network usually account for more than 20% of the parameters of the pre-trained language model, and the method of masking and then predicting largely depends on the head parameters of the language model. But these important parameters are usually discarded when the downstream task fine-tunes.

Disclosure of Invention

In view of the above, a first object of the present invention is to provide a question-answering method based on a knowledge-enhanced generative language model, which divides a knowledge graph according to relationships and trains a knowledge graph sub-model for the relationships, so that the generative language model can learn more relationship semantics in the knowledge graph, thereby improving the question-answering ability of the generative language model.

Based on the same inventive concept, the second purpose of the invention is to provide a question-answering system based on a knowledge-enhanced generative language model.

Based on the same inventive concept, a third object of the present invention is to provide a storage medium.

The first purpose of the invention can be achieved by the following technical scheme:

a question-answering method based on a knowledge enhancement generative language model is characterized by comprising the following steps:

acquiring data, wherein the data comprises a question and answer data set and a knowledge graph data set;

dividing the knowledge graph data set according to the relation attributes, and training a plurality of mutually independent sub-knowledge graph models based on the division results, wherein each sub-knowledge graph model is constructed on the basis of a Bart model;

connecting a plurality of sub knowledge graph models to a Bart For Conditional generation model by using an Adapter technology to construct a knowledge graph enhanced generative language model;

training a knowledge-graph-enhanced generative language model using a question and answer dataset;

inputting the question into the trained model to obtain the answer of the question;

and adding a language modeling head network For a text generation task at the last layer of the Bart model to obtain the Bart For Conditional generation model.

Further, the knowledge graph data set is divided according to the relation attributes, and a plurality of mutually independent sub-knowledge graph models are trained based on the division results, and the method comprises the following steps:

dividing the knowledge graph data sets according to the relation attributes to obtain a plurality of sub knowledge graph data sets;

converting the text of the sub-knowledge-graph dataset into a text vector suitable for Bart model input;

based on a Bart model, connecting an Adapter network on each transformer layer of an encoder and a decoder of the Bart model to construct a sub-knowledge graph model architecture;

and (3) modeling by using a mask language, setting a training target of the sub-knowledge graph model, and training a plurality of mutually independent sub-knowledge graph models by using a contrast learning method.

Further, the method for dividing the knowledge graph data sets according to the relationship attributes to obtain a plurality of sub knowledge graph data sets comprises the following steps:

acquiring triple data of the knowledge graph data set, and determining triple data corresponding to each relation attribute by taking the relation attribute in the triple data as a label;

selecting the first M types of relation attributes with the most quantity according to the quantity of the triple data corresponding to the relation attributes;

and according to the selected relation attribute, randomly sampling in the corresponding ternary group data to obtain M sub-knowledge graph data sets with the same size.

Furthermore, a mask language is used for modeling, a training target of the sub-knowledge graph model is set, and a contrast learning method is used for training a plurality of mutually independent sub-knowledge graph models, and the method specifically comprises the following steps:

constructing a positive sample as (head entity + relation + [ MASK ], tail entity), other samples in the same sub-knowledge map data set as negative samples, and [ MASK ] as a special code in a Bart model; respectively taking output [ LABEL ] codes of the model as text vectors of (head entity + relation + [ MASK ]) and (tail entity), [ LABEL ] is the code of the last word in the hidden state sequence output by the last layer of the model decoder, the total loss function is set as the normalized temperature scale cross entropy of the sample [ LABEL ] code vector in the same sub-knowledge map data set, and the expression is as follows:

wherein N is the number of samples in a sub-knowledge-graph dataset, (z) _i ,z _j ) Mutual positive samples, (z) _i ,z _k ) Each other is a negative sample, sim is a mean square error function of the two vectors, and tau is a temperature parameter for controlling the sensitivity of the loss function to the negative sample pair.

Furthermore, in the training process of the sub-knowledge graph model, all parameters of the Bart model are frozen, so that all parameters of the Bart model do not participate in gradient descent of training, and the Adapter network is trained independently to extract the information of the knowledge graph.

Further, the knowledge graph data set is divided according to the relation attributes, and a plurality of mutually independent sub-knowledge graph models are trained based on the division results, and the method further comprises the following steps:

the semantics of the relation are set as the prompt parameters of the sub-knowledge graph model, so that the sub-knowledge graph model automatically learns the appropriate relation embedding in the training process.

Further, a plurality of sub knowledge graph models are connected to a Bart For Conditional language model by using an Adapter technology, and a knowledge graph enhanced generative language model is constructed, specifically:

and inserting an Adapter layer network into each transform layer of the model, connecting the trained sub-knowledge graph model encoder Adapter network to the Adapter of a Bart For Conditional evolution model encoder, and connecting the decoder Adapter network to the Adapter of a Bart For Conditional evolution model decoder to obtain the knowledge graph enhanced generative language model.

Furthermore, during the training process of the knowledge-graph enhanced generative language model, the language model parameters, the Adapter layer parameters and the Adapter fusion layer parameters all participate in fine adjustment.

The second purpose of the invention can be achieved by the following technical scheme:

a question-answering system based on a knowledge-enhanced generative language model, comprising:

the data acquisition module is used for acquiring data, wherein the data comprises a question and answer data set and a knowledge graph data set;

the sub-knowledge graph model module is used for dividing the knowledge graph data set according to the relation attributes and training a plurality of mutually independent sub-knowledge graph models based on the division result, wherein each sub-knowledge graph model is constructed on the basis of a Bart model;

the generating language model module is used For connecting a plurality of sub knowledge graph models to a Bart For Conditional language model by using an Adapter technology, constructing a generating language model enhanced by a knowledge graph, and training the generating language model enhanced by the knowledge graph by using a question and answer data set;

the output module is used for inputting the question into the generative language model module to obtain the answer of the question and outputting the answer;

The third purpose of the invention can be achieved by the following technical scheme:

a storage medium stores a program that, when executed by a processor, implements the above-described question-answering method based on a knowledge-enhanced generative language model.

Compared with the prior art, the invention has the following beneficial effects:

(1) The invention divides the knowledge graph by taking the relation as a unit, and can learn the sub-model Adapter parameter more meaningful based on the relation compared with the dividing method in the prior art.

(2) The invention sets the semantics of the relation into the prompt parameter capable of learning independently when training the sub-graph model, so that the sub-model can automatically learn the appropriate relation embedding in the training process, and the problem that the same relation may have different expressions in the text is solved.

(3) In the process of predicting the entity by training the mask modeling method, the characteristics of the Bart model are combined, the loss function is constructed by utilizing contrast learning, the problems of large entity prediction space and large prediction difficulty of a plurality of words are solved, and the training efficiency is improved.

(4) According to the method, the model parameters are frozen, so that the pre-training language model learns and extracts the information of the knowledge map through the Adapter network on the premise of not losing the knowledge learned by the prior pre-training.

Drawings

FIG. 1 is a flowchart of a method of example 1 of the present invention;

fig. 2 is a diagram illustrating conversion of input text data into text vectors in embodiment 1 of the present invention;

FIG. 3 is a schematic structural diagram of each transform layer of the sub-knowledge graph model according to example 1 of the present invention;

FIG. 4 is an input/output schematic diagram of the sub-knowledge graph model according to embodiment 1 of the present invention;

FIG. 5 is a schematic diagram of a training method for knowledge-graph information extraction in embodiment 1 of the present invention;

FIG. 6 is a schematic diagram of the structure of each transform layer of the knowledge-graph enhanced generative language model according to embodiment 1 of the present invention;

FIG. 7 is a schematic diagram of the structure of a knowledge-graph enhanced generative language model according to embodiment 1 of the present invention;

FIG. 8 is an input/output diagram of the knowledge-graph enhanced generative language model according to embodiment 1 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.

Example 1:

as shown in fig. 1, the present embodiment provides a question-answering method based on a knowledge-enhanced generative language model, which includes the following steps:

s100, acquiring data, wherein the data comprises a question and answer data set and a knowledge graph data set;

in this embodiment, the question-answer dataset is a training set, a verification set, and a test set of WebQuestion, naturalQuestion, and TriviaQuestion datasets downloaded from a source dataset website; the knowledge map data set is Wikidata5m;

the data of the NaturalQuestion data set comes from network search, and each question is accompanied by a Wikipedia article containing answers. The training set comprises 107369 pairs of questions and answers, the verification set comprises 900 pairs, and the test set comprises 900 pairs;

WebQuestions dataset data is also from web search, with the answer to each question corresponding to an entity in the knowledge graph. The training set comprises 3778 pairs of questions and answers, the verification set comprises 1016 pairs, and the test set comprises 1016 pairs;

TriviaQA is a problem data set from the quiz-league website. Each question is accompanied by a wikipedia web page and a web page for web search matching. The training set comprises 76496 pairs of questions and answers, the verification set comprises 4975 pairs, and the test set comprises 4976 pairs;

the Wikidata5m knowledge graph dataset integrates a Wikidata knowledge graph and a Wikipedia page, each entity inside is described by a corresponding Wikipedia page, and the graph is represented by a series of triples. Here, its translational setting is used, which contains 4813491 entities, 825 relations, 20614279 triplets;

in the embodiment, only the questions in the data set and the corresponding answers are used for training and testing the model, and the attached documents and web pages of each question are not used;

in this embodiment, since a problem has multiple answers in the problem data set, multiple answers to the same problem are split into multiple identical answers to the problem as multiple samples during training, and the answer is determined to be correct as long as the output answer is one of the answer lists during verification and testing. The post-treatment data set distribution is shown in table 1:

data set	Number of samples in training set	Number of samples in verification set	Number of samples in test set
				WebQuestions	8933	1016	1016
NaturalQuestion	107369	900	900
				TriviaQA	961091	4975	4976

TABLE 1

S200, dividing the knowledge graph data set according to the relation attributes, and training a plurality of mutually independent sub-knowledge graph models based on the division results, wherein each sub-knowledge graph model is constructed on the basis of a Bart model, and the method comprises the following steps:

s210, dividing the knowledge graph data sets according to the relation attributes to obtain a plurality of sub knowledge graph data sets, and the method comprises the following steps:

s211, acquiring triple data of the knowledge graph data set, and determining triple data corresponding to each relation attribute by taking the relation attribute in the triple data as a label;

s212, selecting the first 20 relational attributes with the largest quantity according to the quantity of the triple data corresponding to the relational attributes;

and S213, randomly sampling the corresponding triple group data according to the selected relation attribute, and randomly selecting 5000 pairs of triple groups corresponding to the relation to obtain 20 sub-knowledge map data sets with the same size as the training data of the sub-knowledge map model.

S220, as shown in figure 2, the input text of the sub-knowledge graph data set is subjected to word segmentation by a word segmentation device provided by a Bart model and converted into a vocabulary identification id, and then is converted into a text vector through a coding network of the Bart model.

S230, as shown in the figure 3, based on the Bart model, connecting an Adapter network to each transformer layer of an encoder and a decoder of the Bart model, and constructing a sub-knowledge graph model architecture;

s240, setting the semantic meaning of the relationship into a prompt parameter of the sub-knowledge graph model, and enabling the sub-model to automatically learn a proper relationship embedding in training.

S250, as shown in fig. 4 and 5, modeling with a mask language, setting a training target of the sub-knowledge graph model, and training a plurality of mutually independent sub-knowledge graph models with a contrast learning method, specifically:

wherein N is the number of samples in a sub-knowledge-graph dataset, (z) _i ,z _j ) Mutual positive samples, (z) _i ,z _k ) The system comprises negative samples, sim is a mean square error function of two vectors, and tau is a temperature parameter and is used for controlling the sensitivity of a loss function to the negative sample pair; in this embodiment, τ is set to 0.04;

in the expression, the numerator represents the similarity between the sample i and the corresponding positive sample in the same sub-knowledge graph data, and the larger the similarity between the positive samples is, the better the similarity is. The denominator represents the similarity between the sample i and all the corresponding negative samples k in the same child knowledge graph data, and the smaller the similarity between the negative samples is, the better the similarity is.

In the embodiment, the input length of parameter setting is 64, the output length is 64, the batch size is 16, the parameter learning rate is 0.00001, the iteration frequency is 10, all parameters of the Bart model are frozen in the training process of the sub-knowledge graph model, all parameters of the Bart model do not participate in the training gradient reduction, the Adapter network is trained independently to extract the information of the knowledge graph, and finally the trained Adapter layer parameters are stored.

S300, connecting the multiple sub knowledge graph models to a Bart For Conditional generation model by using an Adapter technology, and constructing a knowledge graph enhanced generative language model; and adding a language modeling head network For a text generation task at the last layer of the Bart model to obtain the Bart For Conditional generation model.

Specifically, as shown in fig. 6 and 7, an Adapter layer network is inserted into each transform layer of the model, the trained sub-knowledge graph model encoder Adapter network is connected to the Adapter of the Bart For Conditional evolution model encoder, and the decoder Adapter network is connected to the Adapter of the Bart For Conditional evolution model decoder, so as to obtain the knowledge-graph enhanced generative language model.

S400, training a knowledge graph enhanced generative language model by using a question and answer data set;

in this embodiment, as shown in fig. 8, the input of the generated language model with knowledge graph enhancement is a question of the question-answer data set, and the output is an answer of a corresponding question in the question-answer data set. The language model parameters, the 20 Adapter layer and the Adapter fusion layer parameters all participate in the fine tuning during the training process. The parameter setting input length is 64, the output length is 64, the batch _sizeis 16, the parameter learning rate is 0.00001, and the iteration number is 20. During training, an AdamW optimizer and a linear learning rate strategy are used.

In this embodiment, the EM (Exact Match) index is used as the criterion for judging the accuracy, that is, after the model outputs the answer, the punctuation mark, the article and the extra space are removed and converted into the lower case, and compared with the lower case form of each answer in the data set answer list, if one of the answer is matched, the output is determined to be correct.

And S500, inputting the question into the trained model to obtain the answer of the question.

In summary, when the knowledge graph is divided, the relation is taken as a unit for division, and compared with the dividing method in the prior art, the method can learn more significant relation-based submodel Adapter parameters; in the embodiment, when the sub-graph model is trained, the semantics of the relationship is set as a prompt parameter which can be independently learned, so that the sub-model automatically learns the appropriate relationship embedding in the training process, and the problem that the same relationship may have different expressions in a text is solved; in the process of predicting the entity by the training mask modeling method, the loss function is constructed by utilizing contrast learning in combination with the characteristics of the Bart model, so that the problems of large entity prediction space and large prediction difficulty of a plurality of words are solved, and the training efficiency is improved; in the embodiment, by freezing the model parameters, the pre-training language model learns and extracts the information of the knowledge map through the Adapter network on the premise of not losing the knowledge learned by the prior pre-training.

Example 2:

the embodiment provides a question answering system based on a knowledge enhancement generative language model, which comprises:

the knowledge graph model module is used for dividing the knowledge graph data set according to the relationship attributes and training a plurality of mutually independent sub-knowledge graph models based on the division results, wherein each sub-knowledge graph model is constructed on the basis of a Bart model;

and adding a language modeling head network For a text generation task in the last layer of the Bart model to obtain the Bart For Conditional evolution model.

That is to say, among the above modules of this embodiment, the data acquisition module is used to implement step S100 of embodiment 1, the sub-knowledge-map model module is used to implement step S200 of embodiment 1, the generative language model module is used to implement step S300 and step S400 of embodiment 1, and the output module is used to implement step S500 of embodiment 1; since steps S100 to S500 have been described in detail in embodiment 1, for brevity of description of the specification, the detailed implementation process of each module in this embodiment is referred to in embodiment 1, and is not described again.

Example 3:

the present embodiment provides a storage medium, which stores a program, and when the program is executed by a processor, the method for implementing a question answering method based on a knowledge-enhanced generative language model according to embodiment 1 of the present invention specifically includes:

connecting a plurality of sub knowledge graph models to a Bart For Conditional evolution model by using an Adapter technology to construct a knowledge graph enhanced generative language model;

It should be noted that the computer readable storage medium of the present embodiment may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In the present embodiment, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this embodiment, however, a computer readable signal medium may include a propagated data signal with a computer readable program embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer-readable storage medium may be written with a computer program for performing the present embodiments in one or more programming languages, including an object oriented programming language such as Java, python, C + +, and conventional procedural programming languages, such as C, or similar programming languages, or combinations thereof. The program may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be understood that the embodiments described above are only a few embodiments of the present invention, and the present invention is not limited to the details of the above embodiments, and that any suitable changes or modifications by one of ordinary skill in the art may be made without departing from the scope of the present invention.

Claims

1. A question-answering method based on a knowledge enhancement generative language model is characterized by comprising the following steps:

training a knowledge-graph-enhanced generative language model using a question-answer data set;

inputting the question into a trained model to obtain an answer to the question;

2. The question answering method based on the knowledge enhanced generative language model according to claim 1, wherein the knowledge graph data set is divided according to the relationship attributes, and a plurality of mutually independent sub-knowledge graph models are trained based on the division results, comprising the following steps:

3. The question-answering method based on the knowledge-enhanced generative language model as claimed in claim 2, wherein the knowledge-graph data sets are divided according to the relationship attributes to obtain a plurality of sub-knowledge-graph data sets, comprising the steps of:

4. The question-answering method based on the knowledge-enhanced generative language model according to claim 2, wherein the mask language modeling is used, the training targets of the sub-knowledge graph models are set, and the comparative learning method is used to train a plurality of mutually independent sub-knowledge graph models, specifically:

wherein N is the number of samples in a sub-knowledge-graph dataset, (z) _i ,z _j ) Mutual positive samples, (z) _i ,z _k ) And the sum is a mean square error function of the two vectors, and the tau is a temperature parameter and is used for controlling the sensitivity of the loss function to the negative sample pair.

5. The question answering method based on the knowledge enhanced generative language model according to claim 4, wherein in the training process of the sub-knowledge graph model, all parameters of the Bart model are frozen, so that all parameters of the Bart model do not participate in gradient descent of training, and an Adapter network is trained independently to extract information of the knowledge graph.

6. The knowledge-enhanced generative language model-based question-answering method according to claim 2, wherein the knowledge graph dataset is divided according to the relationship attributes, and a plurality of mutually independent sub-knowledge graph models are trained based on the division results, further comprising the steps of:

7. The question-answering method based on the knowledge-enhanced generative language model according to claims 2 to 6, wherein a plurality of sub knowledge graph models are connected to the Bart For Conditional language model by using an Adapter technology to construct the knowledge-enhanced generative language model, specifically:

8. The question-answering method based on the knowledge-enhanced generative language model of claim 7, wherein language model parameters, adapter layer and Adapter fusion layer parameters participate in fine tuning during the training of the knowledge-graph-enhanced generative language model.

9. A question-answering system based on a knowledge-enhanced generative language model, comprising:

the output module is used for inputting the question into the generating type language model module to obtain the answer of the question and outputting the answer;

10. A storage medium storing a program, wherein the program, when executed by a processor, implements the question-answering method based on the knowledge-enhanced generative language model according to any one of claims 1 to 8.