CN110969010A

CN110969010A - Problem generation method based on relationship guidance and dual-channel interaction mechanism

Info

Publication number: CN110969010A
Application number: CN201911238302.1A
Authority: CN
Inventors: 赵洲; 潘启璠; 王禹潼
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2020-04-07

Abstract

The invention discloses a problem generation method based on relationship guidance and a two-channel interaction mechanism, wherein the problem generation method comprises the following steps: 1) for a set of answers, supporting articles, and unrelated articles, a relationship director is constructed that contains the relationships between the answers and the corresponding articles. 2) And (4) pre-training the relation director obtained in the step (1) to obtain an answer correlation encoder. 3) For a set of answers and articles, word embedding is used for the articles and answers to obtain corresponding representations. And (3) respectively inputting the article and the answer representation into a context encoder and an answer related encoder obtained in the step (2) to obtain the context representations of the original text, the answer and the answer related article. And inputting the context representation into the dual-channel interaction module to obtain the joint representation of the article. 4) And (4) decoding the input question generator which obtains the joint expression of the article in the step (4) to generate a question. 5) And training to obtain a question generation network, wherein the model can generate corresponding questions on the basis of given answers and articles.

Description

Problem generation method based on relationship guidance and dual-channel interaction mechanism

Technical Field

The invention relates to the field of natural language processing problem generation, in particular to a problem generation method based on relationship guidance and a two-channel interaction mechanism.

Background

Article-based question generation is a challenging task that requires the generation of correct and smooth questions on the basis of a given answer and a piece of text. Question generation is currently gaining more and more attention in the field of natural language processing, and there are many successful application areas, such as providing valuable questions to a question answering system, automatically generating the questions of a job, and automatically presenting questions in a dialog system for feedback.

The problem generating method based on the neural network mainly comprises two steps, wherein the first step is to extract a plurality of important sentences from an article by using a supervised neural network or artificial rules, and the second step is to generate a problem according to the extracted sentences by using an encoder-decoder framework. The prior art mainly has the following defects: a large amount of manpower is needed for marking a data set required by training and monitoring a neural network, and meanwhile, the existing question-answer data sets have strong differences in the aspects of question types, language styles and the like; artificially designed rules are not efficient and are difficult to migrate to new domains; the existing method utilizes a coder-decoder neural network to achieve a good effect, but only extracts a few sentences relevant to answers to generate questions, and neglects the connection between the answers and the full text. To overcome these deficiencies, the present invention utilizes full-text information to generate answers to related questions.

Disclosure of Invention

The invention aims to solve the problems in the prior art, and aims to overcome the problem that the prior art only generates a problem according to a plurality of sentences relevant to answers and neglects the relation between the answers and the full text. In particular, the present invention contemplates a discriminator, termed a relationship director, to determine whether an article provides sufficient information to obtain the answer. From another perspective, the relationship director aims to find content from the articles that is helpful in understanding the answers and to determine fraudulent articles that are not relevant to obtaining the answers. Then, the invention designs a dual-channel interactive module, which comprises a multi-step modeling of the original channel and the representation of the output of the relationship director channel, and a control gate for determining the information flux of the two channels. Finally, the present invention will incorporate a relationship director and a two-channel interaction module into the encoder-decoder framework to predict the problem.

The invention adopts the specific technical scheme that:

the problem generation method based on the relation guidance and the dual-channel interaction mechanism comprises the following steps:

1. aiming at a group of answers, supporting articles and irrelevant articles, a relation director containing the relation between the answers and the corresponding articles is constructed, and the relation director contains an answer relevant encoder and a question relevant score function;

2. pre-training the relation director obtained in the step (1) to obtain a trained relation director, and fixing the weight parameters of the answer related encoders in the relation director to obtain the answer related encoders with fixed parameters;

3. constructing a question generation network, wherein the question generation network comprises a word embedding device, a context encoder, a two-channel interaction module, a question generator and an answer correlation encoder with fixed parameters obtained in the step 2;

4. aiming at a group of answers and articles, obtaining context representations of the articles, the answers and the articles related to answers through word embedding and a context encoder in a question generation network and an answer related encoder with fixed parameters obtained in the step 2;

5. the context expression of the articles, answers and answers related articles obtained in the step 4 is input into a double-channel interaction module of a question generation network to obtain a joint expression G of the articles;

6. generating a question generator of a network by the input question which is obtained by the joint representation of the articles in the step 5, and decoding the generated question; problem generation is a decoding process. The problem generator utilizes a long short term memory decoder based on attention mechanism to solve the out-of-vocabulary problem. The encoder attention memory is the joint representation G of the article obtained in step 5. Thereafter, in each decoding step, the attention result and the previous word are inserted into the decoding unit, wherein the attention result output by the encoder is used to initialize the hidden state of the long-term and short-term memory in the decoder.

7. And training to obtain a final question generation network model, and generating a corresponding question by the model on the basis of given answers and a section of article.

The invention has the following beneficial effects:

unlike existing encoder-decoder models, the present invention utilizes weakly supervised labeling to model relationships between related articles and the scope of answers, and transfers the learned relationships to the question generation system. In particular, the present invention contemplates a discriminator, known as a relationship director, to determine whether an article provides sufficient information to obtain the answer. From another perspective, the relationship director aims to find content from the articles that is helpful in understanding the answers and to determine fraudulent articles that are not relevant to obtaining the answers. Then, the invention designs a dual-channel interactive module, which comprises a multi-step modeling of the original channel and the representation of the output of the relationship director channel, and a control gate for determining the information flux of the two channels. Finally, the present invention incorporates a relationship director and a two-channel interaction module into the encoder-decoder framework to predict the problem.

The existing question generation method only extracts a few sentences related to answers to generate a question, and ignores the connection between the answers and the full text. The invention utilizes full-text information to generate and answer related questions, and overcomes the related defects of the existing questions.

Drawings

FIG. 1 is a schematic diagram of a relationship director;

fig. 2 is a schematic diagram of the overall structure of the problem generation network.

Detailed Description

The invention will be further elucidated and described with reference to the drawings and the detailed description.

As shown in fig. 1 and fig. 2, the problem generation method based on relationship guidance and a dual-channel interaction mechanism of the present invention includes the following steps:

step one, aiming at a group of answers, supporting articles and irrelevant articles, a relation director containing the relation between the answers and the corresponding articles is constructed, the structural schematic diagram of the relation director is shown in figure 1 and comprises word embedding, position embedding, an answer-related question encoder and a question-related score function, wherein the answer-related question encoder consists of a self-attention unit, a gated convolutional neural network unit, a bidirectional attention module, a fully-connected feedforward layer and a ReLU activation function; the specific workflow of the relationship director is as follows:

the method comprises the steps of forming a triple group by an answer, a support article and an irrelevant article from a search engine as an input sequence, embedding and representing each word in the triple group by a pre-trained word, coding the position of the word by position embedding, and adding the results of the word embedding and the position embedding to obtain a final representation (p) of the triple group⁺,a,p^-) Where a is the final representation of the answer, p⁺To support the final representation of the article, p^-Is the final representation of the irrelevant article;

the final representation (p) of the triplet⁺,a,p^-) An input answer-dependent encoder comprising a self-attention unit, a gated convolutional neural network unit, and a bi-directional attention module; final representation of the triplet (p)⁺,a,p^-) Encoded by a gated convolutional neural network and a self-attention mechanism according to the following formula:

wherein [ ·]^MAnd [ ·]^NThe repetition times of the self-attention unit or the gated convolutional neural network unit are respectively M times and N times, each self-attention unit SelfAtt (·) and the gated convolutional neural network unit GatedCNN (·) utilize a residual error mechanism and a layer normalization function, f (·) is a fully-connected feedforward layer, and ReLU is taken as an activation function;

for the final representation of the encoded support article,

for the final representation of the encoded unrelated article,

is the final representation of the encoded answer;

will be provided with

And

inputting a bidirectional attention module, and obtaining a similarity matrix S epsilon R between the supporting article and the answer according to the following formula^n×m：

Wherein, W_sIs a matrix that can be trained in a single way,

for the ith vector in the final representation of the encoded support article,

is the jth vector in the final representation of the encoded answer, [;]connections representing vectors⊙ denotes the bitwise multiplication of vectors, S_ijRepresenting similarity between the ith vector in the final representation of the encoded support article and the jth vector in the final representation of the encoded answer;

computing

About

Attention weighting vector of

Computing

About

Attention weighting vector g of (1):

β_i＝Softmax(max_row(S_ij))

will be provided with

g combining the following formula to obtain a support article enhancement representation with bidirectional attention enhancement

In the same way, will

And

inputting the bidirectional attention module to obtain the irrelevant article enhanced representation of bidirectional attention enhancement

1.3) adopting a feedforward layer to obtain scores of enhanced representations of supporting articles and enhanced representations of irrelevant articles, and calculating by using the following problem-related score functions:

wherein, W^(p+,a)Is a trainable weight matrix that is used to determine,

is a relevance score that supports article-answer combinations,

is the relevance score of an irrelevant article-answer combination, MeanPooling is the average pooling operation, Sigmoid is the Sigmoid activation function;

step two, pre-training the relation director obtained in the step one to obtain a trained relation director, and fixing the weight parameters of the answer related encoders in the relation director to obtain the answer related encoders with fixed parameters;

step three, constructing a question generation network, as shown in fig. 2, wherein the question generation network comprises a word embedding device, a context encoder, a two-channel interaction module, a question generator and an answer related encoder with fixed parameters obtained in the step two; the specific workflow of the problem generation network is as follows:

(1) embedding words into a group of answers and articles to obtain the representation of the articles and the representation of the answers;

(2) respectively inputting the representation of the article and the representation of the answer into a context encoder, carrying out context encoding by two bidirectional long-short term memory modules sharing weight, and outputting the context representation of the article and the context representation of the answer;

(3) and inputting the article representation and the answer representation into an answer-related encoder with fixed parameters obtained in the second step, and outputting context representations of the articles related to the answers.

(4) Inputting the context representation of the article, the context representation of the answer and the context representation of the article related to the answer, which are obtained in the step (3), into a dual-channel interaction module; the dual-channel interaction module comprises two channels: an original interaction channel and a transfer interaction channel; the original interactive channel is formed by sequentially connecting an original interactive unit and a context encoder, the transfer interactive channel is formed by sequentially connecting a transfer interactive unit and a context encoder, and the output sides of the original interactive unit, the transfer interactive unit and the context encoder are also respectively connected with a linear layer and a ReLU activation function;

inputting the context representation of an article and the context representation of an answer into an original interaction channel of a dual-channel interaction module, inputting the relevance score of an article-answer combination and the context representation of the answer into a transfer interaction channel of the dual-channel interaction module, and repeating the K steps by adopting a residual error mechanism for an original interaction unit, a transfer interaction unit and two context encoders in the dual-channel interaction module to obtain an output x of the original interaction channel and an output y of the transfer interaction channel; the original interaction unit and the transfer interaction unit both adopt bidirectional attention modules.

(5) And combining the output x of the original interaction channel and the output y of the transfer interaction channel by adopting a control gate according to the following formula to obtain a joint expression G of an article:

g＝σ(W_g[x；y]+b_g)

G＝g·x+(1-g)·y

wherein, W_gAnd b_gIs a trainable referenceNumber, σ is the activation function, G is the control gate, G is the joint representation of the article,. represents the bitwise multiplication of the matrix, [;]representing the concatenation of vectors.

(6) Inputting the joint representation G of the article obtained in the step (5) into a question generator, and decoding to generate a question; problem generation is a decoding process that uses a long-short term memory decoder based on attention mechanism to solve out-of-vocabulary problems. The encoder attention memory is the joint expression G of the article obtained in step (5), after which in each decoding step the attention result and the preceding words are inserted into the decoding unit, where the attention result output by the encoder is used to initialize the hidden state of the long-short term memory in the decoder.

The method is applied to the following embodiments to achieve the technical effects of the present invention, and detailed steps in the embodiments are not described again.

Examples

Experiments were performed on the MS MARCO dataset and the SQuAD dataset. The MS MARCO dataset is a large-scale dataset collected by Bing, containing 1010916 questions, 8841823 related articles extracted from 3563535 web documents. The present invention utilizes the MS MARCO to train the relationship director. In the MSMARCO dataset, there were 10 articles obtained from the search engine after each question-answer combination, and some articles were not enough to derive an answer that could answer the question. The invention marks the irrelevant articles as negative examples and marks the supporting articles as positive examples. The SQuAD dataset is one of the most influential reading comprehension datasets, containing over 10 million questions to 536 Wikipedia articles, and the answer range contained in the articles. The invention divides the whole SQuAD data set into a training set (80%), a development set (10%) and a test set (10%) at the article level.

In order to objectively evaluate the performance of the algorithm of the invention, six indexes, namely BLEU1, BLEU2, BLEU3, BLEU4 and ROUGE-L, METEOR, are used for automatically evaluating the effect of the invention in a selected test set, and three manual evaluation indexes, namely fluency, accuracy and Turing @1, are introduced to verify the effect of the model. According to the steps described in the specific embodiment, the results of the automatic evaluation on six indexes are shown in table 1, the results of the experiments on the fluency standard obtained by comparing with other 4 existing methods are shown in table 2, the results of the experiments on the Accuracy standard obtained by comparing with other 4 existing methods are shown in table 3, the results of the experiments on the Turing @1 standard obtained by comparing with other 4 existing methods are shown in table 4, and the method is represented as follows:

table 1 results of automatic evaluation for six indices

	BLEU1	BLEU2	BLEU3	BLEU4	ROUGE-L	METEOR
							The invention	32.65	22.14	15.86	12.03	32.36	20.25

Table 2 experimental results obtained for fluency criteria

Model (model)	Seq2Seq	Seq2Seq+Attention	Seq2Seq+Attention+Copy	The model
					Fluency degree	1.09	2.83	3.84	4.14

TABLE 3 Experimental results for accuracy standards

Model (model)	Seq2Seq	Seq2Seq+Attention	Seq2Seq+Attention+Copy	The model
					Rate of accuracy	0.01	0.21	0.33	0.53

TABLE 4 Experimental results for Turing @1 Standard

Model (model)	Seq2Seq	Seq2Seq+Attention	Seq2Seq+Attention+Copy	The model
					Turing@1	0.8％	7.8％	32.6％	58.9％

Claims

1. A problem generation method based on relationship guidance and a dual-channel interaction mechanism is characterized by comprising the following steps:

1) aiming at a group of answers, supporting articles and irrelevant articles, a relation director containing the relation between the answers and the corresponding articles is constructed, and the relation director contains an answer relevant encoder and a question relevant score function;

2) pre-training the relation director obtained in the step 1) to obtain a trained relation director, and fixing the weight parameters of the answer-related encoders in the relation director to obtain the answer-related encoders with fixed parameters;

3) constructing a question generation network, wherein the question generation network comprises word embedding, a context encoder, a two-channel interaction module, a question generator and an answer related encoder with fixed parameters obtained in the step 2);

4) aiming at a group of answers and articles, obtaining context representations of the articles, the answers and the articles related to answers through word embedding and a context encoder in a question generation network and an answer related encoder with fixed parameters obtained in the step 2);

5) inputting the context representation of the articles, answers and answers related articles obtained in the step 4) into a double-channel interaction module of a question generation network to obtain the joint representation of the articles;

6) generating the input question of the joint expression of the article obtained in the step 5) into a question generator of a network, and decoding the input question to generate a question;

7) and training to obtain a final question generation network model, and generating a corresponding question by the model on the basis of given answers and a section of article.

2. The problem generation method based on the relationship guidance and the two-channel interaction mechanism as claimed in claim 1, wherein the step 1) is specifically as follows:

1.1) forming a triple of answers, supporting articles and irrelevant articles from a search engine as an input sequence, wherein each word in the triple is represented by a pre-trained word embedding, the positions of the words are encoded by adopting position embedding, and the results of the word embedding and the position embedding are added to obtain a final representation (p) of the triple⁺,a,p^-) Where a is the final representation of the answer, p⁺To support the final representation of the article, p^-Is the final representation of the irrelevant article;

1.2) Final representation (p) of the triplet⁺,a,p^-) Input answer dependent encoder, said answer dependent encoder packetThe system comprises a self-attention unit, a gated convolutional neural network unit and a bidirectional attention module; final representation of the triplet (p)⁺,a,p^-) Encoded by a gated convolutional neural network and a self-attention mechanism according to the following formula:

for the final representation of the encoded support article,

for the final representation of the encoded unrelated article,

is the final representation of the encoded answer;

will be provided with

And

inputting the bidirectional attention module to obtain the branch according to the following formulaSimilarity matrix S epsilon R between prop article and answer^n×m：

Wherein, W_sIs a matrix that can be trained in a single way,

for the ith vector in the final representation of the encoded support article,

is the jth vector in the final representation of the encoded answer, [;]representing concatenation of vectors, ⊙ representing bitwise multiplication of vectors, S_ijRepresenting similarity between the ith vector in the final representation of the encoded support article and the jth vector in the final representation of the encoded answer;

computing

About

Attention weighting vector of

Computing

About

Attention weighting vector g of (1):

β_i＝Softmax(max_row(S_ij))

will be provided with

In the same way, will

And

wherein the content of the first and second substances,

is a trainable weight matrix that is used to determine,

is a relevance score that supports article-answer combinations,

is the relevance score of an irrelevant article-answer combination, MeanPooling is the average pooling operation, and Sigmoid is the Sigmoid activation function.

3. The problem generation method based on the relationship guidance and the two-channel interaction mechanism as claimed in claim 1, wherein the step 2) is specifically as follows:

design loss function L_r：

Wherein the content of the first and second substances,

and

representing relevance scores for supporting article-answer combinations and relevance scores for irrelevant article-answer combinations, c being a defined hyperparameter, R being a set of supporting article-answer-irrelevant article combinations;

pre-training the relation director obtained in the step 1) to obtain a trained relation director, and fixing parameters of the answer related encoder in the relation director to obtain the answer related encoder with fixed parameters.

4. The problem generation method based on the relationship guidance and the two-channel interaction mechanism as claimed in claim 1, wherein the step 4) is specifically as follows:

4.1) embedding words in a group of answers and articles to obtain the representation of the articles and the representation of the answers;

4.2) respectively inputting the representation of the article and the representation of the answer into a context coder, carrying out context coding by two bidirectional long-short term memory modules sharing weight, and outputting the context representation of the article and the context representation of the answer;

4.3) inputting the article representation and the answer representation into the answer-related encoder with fixed parameters obtained in the step 2), and outputting the context representation of the article related to the answer.

5. The problem generation method based on the relationship guidance and the two-channel interaction mechanism as claimed in claim 1, wherein the step 5) is specifically as follows:

5.1) inputting the context representation of the article, the context representation of the answer and the context representation of the article related to the answer obtained in the step 4) into a dual-channel interaction module; the dual-channel interaction module comprises two channels: an original interaction channel and a transfer interaction channel; the original interactive channel is formed by sequentially connecting an original interactive unit and a context encoder, the transfer interactive channel is formed by sequentially connecting a transfer interactive unit and a context encoder, and the output sides of the original interactive unit, the transfer interactive unit and the context encoder are also respectively connected with a linear layer and a ReLU activation function;

inputting the context representation of an article and the context representation of an answer into an original interaction channel of a dual-channel interaction module, inputting the relevance score of an article-answer combination and the context representation of the answer into a transfer interaction channel of the dual-channel interaction module, and repeating the K steps by adopting a residual error mechanism for an original interaction unit, a transfer interaction unit and two context encoders in the dual-channel interaction module to obtain an output x of the original interaction channel and an output y of the transfer interaction channel;

5.2) combining the output x of the original interaction channel and the output y of the transfer interaction channel by adopting a control gate according to the following formula to obtain a joint expression G of an article:

g＝σ(W_g[x；y]+b_g)

G＝g·x+(1-g)·y

wherein, W_gAnd b_gIs a trainable parameter, σ is an activation function, G is a control gate, G is a joint representation of an article, [;]representing the concatenation of vectors.

6. The problem generation method based on the relationship guidance and the dual-channel interaction mechanism as claimed in claim 5, wherein the original interaction unit and the transfer interaction unit both use a bidirectional attention module.

7. The method of claim 1, wherein the problem generator in step 6) utilizes a long-short term memory decoder based on attention mechanism to solve the out-of-vocabulary problem.