CN114168749A

CN114168749A - Question generation system based on knowledge graph and question word drive

Info

Publication number: CN114168749A
Application number: CN202111475261.5A
Authority: CN
Inventors: 荣文戈; 周世杰; 欧阳元新; 熊璋
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2022-03-11

Abstract

The invention discloses a problem generation system based on knowledge graph and question word drive, comprising: the text preprocessing module is used for preprocessing the text; the one-hop knowledge map construction module is used for constructing a one-hop knowledge map based on the preprocessed text; the attention vector calculation module is used for calculating a static map attention vector based on a one-hop knowledge map; a feature enhanced encoder; the gated attention mechanism module is used for storing additional context semantic information of the encoder part and further expanding the context semantic of the input text; the decoder is used for decoding the intermediate state one-dimensional vector coded by the coder so as to output final word probability distribution; a knowledge matching module; the semantic search space matching module is used for calculating the semantic similarity of the question and the answer; and the query word prediction module is used for predicting the query words corresponding to the input text.

Description

Question generation system based on knowledge graph and question word drive

Technical Field

The invention belongs to the technical field of natural language processing research, and particularly relates to a problem generation system based on knowledge map and question word drive.

Background

In recent years, with the tremendous increase of computer hardware computing power and the deep progress of deep learning research, the natural language generation technology has greatly progressed. The problem Generation technology has also achieved certain results as one of the most important links of Natural Language Generation (NLG), and a number of data-driven deep learning models have been created. With the popularization of artificial intelligence application, people have stronger and stronger cravings for man-machine question answering, and a problem generation system is the most complex and extremely challenging ring in artificial intelligence, especially natural language processing. On the one hand, the generated questions must be able to catch the topic of the question and answer and the relevant facts, and on the other hand, the model generated questions must be highly rich and diverse to ensure a high quality user experience.

Knowledge maps have been shown to greatly improve the performance of Natural Language Processing (NLP) models. In the daily chat or conversation process, the question raising is a very common scenario. Therefore, generating a proper and meaningful question is critical to the automated question and answer technology. The question generation plays an extremely important role in the question and answer task, aims to generate questions related to the text according to the given input text, and is widely applied to the fields of question and answer systems, dialog systems, chat robots and the like. In daily chatting, a problem is thrown, and the topic of the chatting can be determined, so that subsequent conversations can be better carried out; in a search engine, people often input a question, expecting to obtain relevant answers and retrieve contents; in the intelligent customer service system, the system can automatically generate the problems associated with the keywords input by the user, and provides the user for searching, thereby greatly improving the customer service efficiency. In recent years, various problem generation models have been proposed by many scholars, however, the problem of semantic mismatch, especially the question words of the problem, still occurs. Whether the query word is correct or not directly determines whether the semantics of a question are clear and definite. For example, for The place "The Forbidden City", The generated problem needs to begin with "where", otherwise, The problems of unclear semantics and no ambiguity occur, and The user experience and The model performance are seriously affected. On the other hand, whether the semantics of the question are rich is also one of the important factors for determining the quality of the question generation model. In the question-answer scenario, both the question and the answer are often discussing something, and have a certain relevance, for example, for the answer "Ilike applets best all", the question is often asked around "front". Thus, fusing knowledge into the question generation model may expand the semantics of the input text to generate a higher quality question.

The difficulty of current research in the field of problem generation is mainly that: 1) the semantic meaning of the generated problem is not rich enough, and the problem of boring and boring is often generated; 2) models are prone to generating problems that are not sufficiently close or wrong, such as wrong query words, semantically irrelevant problems, etc., causing ambiguity or misunderstanding. In conclusion, the introduction of external knowledge and predictive query words into the problem generation system has certain prospects, so that the direction is selected as the research focus of the invention.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the problem that problems generated in a traditional problem generation system based on a neural network are too general and easy to separate from problems is solved, and meanwhile, semantic information of context is enhanced and the generated semantics are richer is solved. The performance of the model is improved by a one-hop-based graph attention machine mechanism and three auxiliary tasks, and meanwhile the prediction accuracy of the questioning words is improved. The first core point of the question generation system is how to generate questions in accordance with the question-answer semantics. Compared with other existing problem generation system models, the problem generated by the method is richer in semantics and closer to question and answer facts. Other problem generation systems often generate relatively boring, general or off-topic problems, which greatly reduces user experience. The second core point of the problem generation system is how to enhance the context semantics, and the enhancement of the context semantics can make the problem generated by the model more real. Therefore, the semantic information of the context is enhanced by introducing a one-hop knowledge graph structure, a knowledge matching module and a semantic search space matching module, and finally the purpose of improving the system performance is achieved.

The system jointly learns the problem generation task under the multi-task learning framework and outputs the final result. Specifically, the system is designed from four aspects under the framework of multitask learning: firstly, a gating self-attention mechanism is designed, and the mechanism can dynamically and self-adaptively acquire the semantic information of the context, so that the encoding performance of an encoder is improved; secondly, a separate auxiliary task, namely a knowledge matching module, is designed to promote the model to be capable of paying attention to the fact information most relevant to the question and answer; thirdly, a semantic search space matching module is constructed to shorten the distance between the question and the answer in the semantic search space; fourthly, a questioning word predicting module is constructed, which aims to predict and output the questioning words corresponding to the problems, and further improves the quality of the generated problems. Through a multi-task learning mechanism, the invention can generate more appropriate problems.

The technical scheme for solving the technical problems comprises the following steps: a knowledge-graph and question-word-driven question generation system, comprising:

the text preprocessing module is used for preprocessing the text;

the one-hop knowledge map construction module is used for constructing a one-hop knowledge map based on the preprocessed text;

the attention vector calculation module is used for calculating a static map attention vector based on a one-hop knowledge map;

the characteristic-enhanced encoder is characterized in that a one-hop knowledge map is converted into a one-dimensional static word embedding vector, and then the one-dimensional static word embedding vector is spliced with a word embedding vector of an input text, an answer position information vector of the input text and a vocabulary characteristic information vector to serve as net input of the encoder so as to enhance the capability of the encoder for acquiring context semantic information;

the gated attention mechanism module is used for storing additional context semantic information of the encoder part and further expanding the context semantic of the input text;

the decoder is used for decoding the intermediate state one-dimensional vector coded by the coder so as to output final word probability distribution;

a knowledge matching module; to generate semantically more relevant questions;

the semantic search space matching module is used for calculating the semantic similarity of the question and the answer;

and the query word prediction module is used for predicting the query words corresponding to the input text.

According to another aspect of the invention, a problem generation method based on knowledge graph and question word drive is provided, and the method comprises the following specific steps:

the method comprises the following steps of (1) preprocessing a text, and specifically comprises the following steps:

and uniformly processing the text formats, namely processing all texts, deleting redundant spaces before, after and in the middle, and removing non-English letter symbols. Encoding each word into a 300-dimensional word embedding form by using GlobalVectors for WordRepression (GloVe) encoding, wherein the vocabulary size of the GloVe is selected to be 30000, and the unknown word is expressed as < UNK >.

Step (2), constructing a one-hop knowledge graph, which comprises the following specific steps:

selecting a concept net large-scale common sense graph as a knowledge base, searching one-hop nodes in the common sense graph aiming at each word of an input text, wherein the number of the nodes is fixed to be 60. A bottom triplet NOT _ a _ FACT is used to represent triples that do NOT match to any entity. In this way, a one-hop knowledge graph is obtained that is composed of triplets. At the same time, a copy of the one-hop knowledge-graph is retained.

And (3) calculating the attention vector of the static graph based on the one-hop knowledge graph. Each word in the input sentence is matched to its corresponding multi-hop map and the multi-hop map is converted to a corresponding static map attention vector for input into the encoder structure. Let K be K ═ K₁,...,k_|K|The triple vector of the knowledge map is represented as k_i＝(h_i,r_i,t_i) Where, | K | represents the number of triples in the set K, h_i、r_iAnd t_iAre respectively triplets k_iCorresponding head, relation and tail vectors, i ∈ [1, | K]. In order to obtain the graph embedding vector corresponding to the one-hop knowledge graph, a one-hop knowledge graph set is obtained firstly

Wherein x is an input sequence; | x | is the length of the input sequence; one represents the relative symbol of the one-hop knowledge-graph, as follows.

One-hop knowledge graph at time t

The corresponding three tuple sets are

Wherein the content of the first and second substances,

is a set

The number of elements contained in the steel;

is provided with

And

j-th triplet at respective time t

The included head, relationship and tail vectors. Therefore, the final one-hop static map attention vector g at time t can be calculated by the following formula_t。

Wherein, g_tRepresenting the attention vector of the one-jump static graph corresponding to the input at the time t; alpha is alpha_tiThe attention scores of the 0-hop entity and the ith one-hop entity input at the time t; exp (·) is an exponential function with a natural constant e as base; τ (-) is a bilinear attention function; [;]is the concatenation symbol of the vector.

And (4) constructing a characteristic enhanced encoder. The encoder uses a bidirectional Long Short-Term Memory (LSTM) for encoding, and the calculation is as follows:

wherein the content of the first and second substances,

and

respectively representing hidden layer vectors of the LSTM at the t moment; enc is the mark of the encoder, the same as below;

the input vector after splicing; x is the number of_t、g_t、m_tAnd l_tRespectively representing a word embedding vector at the time t, a corresponding one-hop static graph attention vector, an answer position information vector and a vocabulary characteristic information vector.

And (5) constructing a gated self-attention mechanism module. The invention designs a gating self-attention mechanism, aiming at enhancing the capability of an encoder for acquiring context semantic information. Firstly, a vector matrix of a hidden layer of an encoder is obtained

Wherein the content of the first and second substances,

d is the dimension of the LSTM hidden layer state; | x | is the length of the input sequence x;

is provided with

[；]Are vector concatenators. Then, a self-attention moment array is obtained by a self-attention algorithm

Finally, a gate control unit controls the finally generated self-attention moment array

Wherein the content of the first and second substances,

and

respectively representing a matrix obtained from the attention algorithm, a matrix obtained after fusing the original matrix and the attention algorithm, and a matrix finally obtained through a gating mechanism, and

| x | is the length of the input sequence x; q, K and V are both state parameter matrices calculated from the attention scores; softmax (-), tanh (-), MLP (-), and σ (-), represent the Softmax function, tanh function, the multi-layer perceptron function, and the Sigmoid function, respectively;

a connection symbol representing a matrix; an indicator indicating a multiplication of a corresponding position of the matrix; j is a full 1 matrix, and the matrix dimension is consistent with H.

And (6) constructing a decoder part. The invention designs a decoder based on attention mechanism, which is composed of another LSTM, as follows:

wherein the content of the first and second substances,

is the hidden layer vector of the decoder at the moment t; dec is a correlation symbol representing the decoder, the same applies below; y is_t-1Represents the output vector of the decoder at time t-1;

an attention vector representing time t; beta is a_tjRepresenting the attention score between the hidden layer vector at the moment t of the decoder and the jth input sequence of the encoder;

a gated self-attention vector for encoder time j; softmax (-) and τ (-) are the Softmax function and the bilinear attention function, respectively; [;]are vector connectors.

A copy mechanism is introduced at the decoding stage to avoid that the decoder ignores some important low frequency words. I.e. the decoder generates two distributions:

P(o_t)＝(1-μ_t)P_v(o_t＝w_v)+μ_tP_c(o_t＝w_c)

wherein, mu_tA threshold value between 0 and 1; sigma (·), MLP (·) and copy (·) are respectively a Sigmoid function, a multilayer perceptron function and a copy mechanism function; o_tIs the output value at the time t; w is a_vAnd w_cRespectively representing words generated from a vocabulary and generated according to a copy mechanism; subscripts v and c represent bar identifiers generated from the vocabulary and according to the copy mechanism, respectively; [;]are vector connectors.

The loss function of the decoder can be expressed by:

wherein L is_qIs a loss function of the decoder; log (-) is a logarithmic function with a natural constant e as a true number; y is_tIs the output value of the decoder at the moment t; | y | is the length of the decoder output sequence; x is an input sequence; Γ is the set of knowledge-graph fact triple vectors.

And (7) constructing a knowledge matching module. The invention designs an additional knowledge matching module to generate a problem with more relevant semantics. In a question-and-answer scenario, questions and answers tend to have strong semantic relevance and both expand based on certain facts. Given set of knowledge-graph fact triplets vectors

And is

Has f_i＝(h_i,r_i,t_i) Wherein, | Γ | is the number of elements in the triple set Γ; h is_i、r_iAnd t_iAre triplets f respectively_iCorresponding head, relation and tail vectors; h is_iAnd t_iHas a dimension of d_e,r_iHas a dimension of d_r. Thus, the set Γ may constitute oneKnowledge matrix

In the encoder-decoder architecture, the hidden layer state of the encoder can be regarded as prior information, and the hidden layer state of the decoder can be regarded as posterior knowledge, for which, the knowledge matching module calculates two distributions fusing external information, namely prior distribution zeta_priorAnd posterior distribution ζ_postAnd the dimensions of the two distributions are both | Γ |, the ith dimension of the two distributions represents the attention degree of the model to the ith fact triple.

Wherein, Softmax (-) and tanh (-) are Softmax function and tanh function respectively; w_F、W_priorAnd W_postAre all parameter matrices; [;]is a vector connector; an indicator indicating a multiplication of a corresponding position of the matrix; | x | is the length of the input sequence x; | y | is the length of the decoder output sequence y;

is the attention vector at decoder y instant.

And finally, taking Jensen-Shannon Divergence (JS Divergence) of the prior distribution and the posterior distribution as a loss function of the knowledge matching module:

L_k＝JS(ζ_prior||ζ_post)

wherein L is_kA loss function for the knowledge matching module; JS (| |) is the JS divergence function.

And (8) constructing a semantic search space matching module. The invention designs an additional auxiliary task, namely a semantic search space matching module, so as to calculate the semantic similarity of the question and the answer. The parameter matrix W obtained in step (7)_FAnd W_postContains a large amount of a posteriori information, i.e. semantic information related to the problem. For this purpose, the invention designs two mapping functions

And ψ (-) to project the question vector and answer vector to the corresponding semantic search spaces, respectively:

wherein e is_aveThe average value of all word embedding vector splicing vectors of the word embedding layer is obtained; x is the number of_t、g_t、m_tAnd l_tRespectively representing a word embedding vector at the time t, a corresponding one-hop static graph attention vector, an answer position information vector and a vocabulary characteristic information vector; w_s、W_FAnd W_postAre all parameter matrices;

and

the space vectors are searched for the semantics corresponding to the question and the answer, respectively. In addition, there are:

wherein the content of the first and second substances,

and ψ (·) is a mapping function for the question and answer sequences, respectively;

is L-2 norm.

Finally, calculation was done by Kullback-Leibler Divergence (KL Divergence)

And

distance in the projected semantic search space and as a loss function of this module:

wherein L is_sLoss function of this module, D_KL(| |) is the KL divergence function.

And (9) constructing a query word prediction module. The invention designs an additional auxiliary task, namely a query word prediction module, so as to predict the query words corresponding to the input text. The query word prediction module is composed of a classifier, the input of the classifier is an answer label sequence, and a TextCNN model is adopted as the classifier of the query words. Given answer label sequence

Wherein x is_aA starting word representing a sequence of answer labels; | x_ansIf | represents the length of the answer label sequence, the output probability distribution of the predicted query word is calculated as follows:

P_TextCNN(u_t＝s_k)＝Softmax(TextCNN(x_ans))

wherein u is_tAs output of the classifier, s_kTo classifyA tag value in a set of tags; TextCNN (·) is a TextCNN network function.

Finally, the cross entropy is used as the loss function of the module:

wherein Lr is a loss function of the module; log (-) is a logarithmic function with a natural constant e as a true number;

is the answer x_ansCorresponding real query word tags.

To this end, the final loss function of the present invention can be calculated by:

L＝λ_q·L_q+λ_k·L_k+λ_s·L_s+λ_r·L_r

wherein L is the final loss function; lambda [ alpha ]_q、λ_k、λ_sAnd λ_rAre respectively L_q、L_k、L_sAnd L_rThe weight coefficient of (2).

Compared with the prior art, the invention has the advantages that:

1. the invention is optimized from two aspects. In one aspect, to improve the ability of the encoder to obtain context information, the model employs a feature-enhanced encoder. Meanwhile, a gated self-attention mechanism is also provided to further acquire the context semantic information of the input text. The self-attention vector can be regarded as an additional memory network for storing rich contextual information to enhance contextual semantic information of the input text. On the other hand, the model extracts knowledge triplets from the input text and vectorizes them as knowledge fusion embedding layers to further expand the context of the encoding stage. Meanwhile, the invention constructs 3 independent subtask modules which are respectively a knowledge matching module, a semantic search space matching module and a questioning word prediction module. The core idea of the knowledge matching module is to improve the capability of the model to pay attention to the fact related to the input text, and promote the decoder to generate the problem that the semantics are more relevant by introducing an external knowledge spectrogram and the posterior information of a decoding stage. The semantic search space matching module calculates the semantic matching degree between the text and the output problem, and performs combined training by taking the semantic matching degree as a part of a loss function, so that the semantic association degree between the generated problem and the input text is improved, and the generation independence problem is further avoided. Finally, the questioning word predicting module constructs a multitask classifier introducing external knowledge so as to improve the precision rate of predicting the questioning words.

In order to combine the 3 auxiliary subtasks proposed above, i.e., the knowledge matching module, the semantic search space matching module, and the query word prediction module, with the encoder-decoder architecture as the main task, the present invention adopts a multi-task learning strategy to calculate the loss functions of all four tasks, respectively, and takes the weighted and summed loss function as the final loss function of the model.

2. In the decoding stage, the invention strategically proposes a gating mechanism, and the finally generated words can be generated from two places, namely a word stock and an input text. Through a gating mechanism, the cumulative distribution of the two distributions is calculated, so that the finally generated word distribution is more accurate, the semantic information of the context can be captured more accurately, the model performance is greatly improved, and the user experience is improved. The problem of final generation is that the non-high frequency words are not easy to generate from the word list, so the invention obtains the probability of the output words from the word list through a copy mechanism, thereby leading the generated problem semantics to be richer and more appropriate, and simultaneously having a certain information content.

Drawings

FIG. 1 is a system overview chart generated based on knowledge-graph and question-word driven questions;

FIG. 2 is a schematic diagram of a model of a projection of a question vector and an answer vector to a semantic search space;

FIG. 3 is a schematic diagram of a module model for predicting interrogative words.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention without creative efforts.

The invention relates to a question generation system based on knowledge graph and question word drive, and a system model is shown in figure 1. The model is trained under a multi-task learning framework, a main task module is a Seq2Seq framework based on an attention mechanism, and meanwhile, 3 independent auxiliary task modules are introduced, wherein the independent auxiliary task modules are respectively as follows: 1) a knowledge matching module; 2) a semantic search space matching module; 3) a query prediction module. The knowledge matching module fuses knowledge information into hidden layer vectors of an encoder and a decoder, and calculates two distributions, namely a priori distribution and a posterior distribution, so that a model can focus on the knowledge information which is more appropriate to a question and answer scene; the core idea of the semantic space searching module is to reduce the distance between the question vector and the answer vector in the projected semantic searching space so as to ensure that the generated question is semantically as close as possible to the answer; the query word prediction module constructs a simple classifier to predict the query words of the problem, and avoids generating a grammar error or an improper problem.

The method makes full use of the characteristics of a multi-task learning framework, and introduces 3 additional auxiliary tasks for improving the performance of the model. By introducing the knowledge graph and constructing the knowledge graph with one hop, the finally generated problem is more appropriate and the semantic information is richer.

The method firstly cuts the large-scale knowledge graph into a one-jump knowledge graph, and calculates the attention vector of the static graph through the attention mechanism of the static graph, wherein the vector is used for enhancing the semantic information of the input text. And the data is trained by combining with a multi-task learning framework, and the final experimental result of the invention is obviously superior to that of the existing problem generation system.

According to one embodiment of the invention, a system for generating questions based on knowledge-graph and query word driving comprises:

the text preprocessing module is used for preprocessing the text;

a knowledge matching module; to generate semantically more relevant questions;

According to another embodiment of the invention, a knowledge-graph and query word-driven question generation method is provided, which comprises the following steps:

One-hop knowledge graph at time t

The corresponding three tuple sets are

Wherein the content of the first and second substances,

is a set

The number of elements contained in the steel;

is provided with

And

j-th triplet at respective time t

wherein the content of the first and second substances,

and

Wherein the content of the first and second substances,

is provided with

Finally, the process is carried out in a batch,controlling the finally generated self-attention moment array through a gate control unit

Wherein the content of the first and second substances,

and

| x | is the length of the input sequence x; q, K and V are both state parameter matrices calculated from the attention scores; softmax (-), tanh (-), MLP (-), and σ (-), represent the Softmax function, tanh function, the multi-layer perceptron function, and the Sigmoid function, respectively; o denotes a connection symbol of the matrix; an indicator indicating a multiplication of a corresponding position of the matrix; j is a full 1 matrix, and the matrix dimension is consistent with H.

wherein the content of the first and second substances,

P(o_t)＝(1-μ_t)P_v(o_t＝w_v)+μ_tP_c(o_t＝w_c)

The loss function of the decoder can be expressed by:

And is

Has f_i＝(h_i,r_i,t_i) Wherein, | Γ | is the number of elements in the triple set Γ; h is_i、r_iAnd t_iAre triplets f respectively_iCorresponding head, relation and tail vectors; h is_iAnd t_iHas a dimension of d_e,r_iHas a dimension of d_r. Thus, the set Γ may constitute a knowledge matrix

is the attention vector at decoder y instant.

L_k＝JS(ζ_prior||ζ_post)

And ψ (-) to project the question vector and the answer vector to the corresponding semantic search spaces, respectively, fig. 2 is a schematic diagram of a model in which the question vector and the answer vector are projected to the semantic search spaces:

and

wherein the content of the first and second substances,

is L-2 norm.

Finally, calculation was done by Kullback-Leibler Divergence (KL Divergence)

And

And (9) constructing a query word prediction module. The present invention designs an additional auxiliary task, namely, a query word prediction module, to predict the query words corresponding to the input text, as shown in fig. 3. The query word prediction module is composed of a classifier, the input of the classifier is an answer label sequence, and a TextCNN model is adopted as the classifier of the query words. Given answer label sequence

P_TextCNN(u_t＝s_k)＝Softmax(TextCNN(x_ans))

wherein u is_tAs output of the classifier, s_kIs the label value in the classified label set; TextCNN (·) is a TextCNN network function.

Finally, the cross entropy is used as the loss function of the module:

is the answer x_ansCorresponding real query word tags.

L＝λ_q·L_q+λ_k·L_k+λ_s·L_s+λ_r·L_r

In the steps (2) and (3), a static attention vector based on a one-hop knowledge graph is introduced into a model encoder part, so that a decoder is promoted to generate a problem that semantic information is richer. Meanwhile, the attention vector of the one-hop static graph is introduced, so that the waste of computing resources and under-fitting caused by using a global large-scale knowledge graph can be avoided, the potential semantic information of sentences can be greatly captured, and the quality of a system model is improved.

In the step (4), the input layer of the encoder not only considers the word embedding vector, but also considers the attention vector of the one-jump static image, the position information of the word and the vocabulary characteristics, so that the context semantic characteristics of the input text are greatly enriched, and the encoder is promoted to obtain richer semantic information.

In the step (5), the encoder part adopts a gated self-attention mechanism, the self-attention mechanism can dynamically and adaptively acquire the semantic information of the context, and the key point is that the weight of the self-attention is dynamically calculated, so that the model can pay attention to different context words with a tendency according to the weights of the different weights, and the performance of the model for acquiring the context information is greatly improved. The gating mechanism ensures that the final vector generated by the encoder can reach balance between the hidden layer state and the output result of the self-attention mechanism, avoids the excessive attention of the model to certain context, and further improves the generalization capability of the model.

In the step (6), an attention mechanism is adopted in the decoding process, so that the problem of information loss caused by overlong input sequences is effectively avoided, and the performance of the encoder is further improved.

In the step (7), an additional auxiliary task module, namely a knowledge matching module, is introduced to promote that the model can focus on the fact information most relevant to the question and answer. Two kinds of distribution are introduced, the first kind is prior distribution introducing external knowledge, the second kind is posterior distribution introducing external knowledge and decoder hidden layer state information, and similarity degree between the two kinds of distribution is evaluated through JS divergence, thereby promoting that the problem generated by the system can be closer to relevant facts.

In said step (8), an additional auxiliary task module, i.e. a semantic search space matching module, is introduced to make the generated question semantically as close as possible to the input text.

In the step (9), an additional auxiliary task module, namely a query word prediction module is introduced, and the module is composed of a simple classifier and aims to predict the query words in the encoder part and promote the decoder to generate more pertinent problems.

The experimental data set is a Stanford QuestionAnswering Dataset (SQuAD) question-answer data set, and 5 experiments are supplemented to prove the effectiveness of the system, wherein the effectiveness comprises the following steps:

(1) and automatically evaluating the experiment, wherein the specific indexes are BLEU-1, BLEU-2, BLEU-3, BLEU-4, Rouge-L and Meteor.

(2) The evaluation indexes of the model self-ablation experiment are BLEU-1, BLEU-2, BLEU-3, BLEU-4, Rouge-L and Meteor.

The model in the invention is named as KBIDN (knowledge Graph Based and iterative Word drive network), and the baseline models participating in experimental comparison are Seq2Seq, Seq2Seq + Att, NQG + +, s2s-a-at-mcp-gsa, Q-drive and ASs2 s. The models participating in the ablation experiment are KBIDN, KBIDN-w/o OSGA, TBGAN-w/o LF and TBGAN-w/o GSA. TBGAN-w/o CP, TBGAN-w/o K & S, TBGAN-w/o SSSM and TBGAN-w/o IWP represent ablation submodels for removing a one-hop static map attention mechanism, removing lexical feature information, removing a gated self-attention mechanism layer, removing a copy mechanism, simultaneously removing a knowledge matching module and a semantic search space matching module, removing a semantic search space matching module and removing a query word prediction module, respectively.

(3) Interrogative predictive experiments. The baseline models involved in the experiment were NQG + + and Q-drive.

(4) And (3) researching an influence experiment of beam search on Seq2Seq decoding, wherein evaluation indexes are BLEU-1, BLEU-2, BLEU-3, BLEU-4, Rouge-L and Meteor.

(5) Sample analysis experiments, the baseline models involved in the experiments were NQG + + and ASs2 s.

TABLE 1 Performance of the model in the automated assessment experiment

Table 1 shows the experimental results of the proposed model and baseline model automated evaluation. The model provided by the invention is higher than the baseline model in all indexes, and the problem that the model can be generated to be closer to the ground route is fully proved. The reason may be that the model proposed by the present invention introduces a gated self-attention mechanism in the encoder part to improve the ability of the encoder to acquire context semantic information, and meanwhile, a pre-trained one-hop knowledge map-based static attention vector is spliced in the word embedding layer to further integrate external knowledge information into the hidden layer state of the encoder. In addition, the knowledge matching module further promotes the generation of the distance between the problem and the group channel in the semantic space by introducing the posterior information, so that the problem closer to the group channel can be generated. After the attention mechanism is introduced to the naive Seq2Seq model, the experimental effect is remarkably improved, and the attention mechanism can fully acquire the context semantic information in the encoder. The Q-drive model fully shows that the multi-task learning technology can improve the performance of the simple model by adding a simple auxiliary task, namely, a predicted question word, and the experimental result of the Q-drive model is higher than s2s-a-at-mcp-gsa with a complex model structure.

TABLE 2 Performance of the model in ablation experiments

Table 2 problem generation results of systematic ablation experiments. It was found that the overall performance of the model was degraded regardless of which part of the sub-modules was removed. It is worth noting that when the static attention vector based on one hop is removed or the probability distribution of copy is not calculated, the performance of the model is reduced remarkably, the scores of the model on the index BLEU-4 are reduced by 5.9% and 6.2%, respectively, and the problem that the model generation is close to the ground route can be greatly promoted by introducing an OSGA module and a copy mechanism is effectively proved. The reasons may be: 1) a large amount of external knowledge is fused on the basis of the one-hop static attention vector, and the context semantic information of the text input by the encoder is greatly enriched; 2) the copy mechanism can improve the probability of low-frequency words in the probability distribution of the generated words, the BLEU and other indexes are calculated according to the overlapping degree of the words, the higher the overlapping degree is, the higher the score is, and therefore the copy mechanism can also greatly improve the performance of the model. In addition, when the knowledge matching module and the semantic search space matching module are removed, the score of the model on the BLEU-4 index is reduced most remarkably, and the amplitude is 10.9%, because the knowledge matching module utilizes a large amount of posterior knowledge to improve the coding performance of the coder in the training stage so as to generate a problem which is more close to ground route; and when only the semantic search space matching module is removed, the scores of all indexes slightly rise, and the effectiveness of the knowledge matching module is laterally proved. Finally, the fact that the influence of the gating self-attention mechanism removal module or the query word prediction module on the model is small is found, the reduction amplitude on the BLEU-4 is 2.9% and 2.2% respectively, and the reason is probably that under the multi-task learning framework, the richness of the gating self-attention mechanism for acquiring the context semantic information is reduced due to the fact that the generalization capability of the model is improved.

TABLE 3 percentage of doubtful words in SQuAD dataset

TABLE 4 accuracy rate of question words predicted by the question generation System

Table 3 shows the percentage of each query in the SQuAD data set. Table 4 shows the accuracy of the proposed and baseline models to predict the interrogative words. The accuracy of the KBIDN model provided by the invention on most of doubtful word prediction is slightly higher than that of a Q-drive model and is far higher than that of an NQG + + model. The reason may be that the present invention treats the predictive interrogatories as a separate subtask and trains them jointly with Seq2Seq and other auxiliary tasks. During the training process, the model can continuously improve the prediction capability for generating the questioning words and promote the decoder to generate more pertinent problems. The NQG + + model does not design additional auxiliary tasks for the questioning words, so that the prediction accuracy rate of the NQG + + model on most of the questioning words is low, but it is worth noting that the NQG + + model has the highest accuracy rate on the questioning words of the type "why", because the NQG + + model considers the conversion relation among the questioning words, for example, "why" can be replaced by "for what", so that the NQG + + model has a better prediction effect on the type "why". Although the Q-drive model also adopts a similar idea to construct an independent auxiliary task to learn the relevant information of the query word, the encoder of the Q-drive model is inferior to KBIDN in the capability of acquiring the context semantic information because external knowledge is not introduced to enrich the context information, so that the Q-drive model cannot concentrate on the entity or knowledge relevant to the input text, and the prediction result is influenced.

TABLE 4 comparison of greedy and bundle searches on the TBIDN model

In the decoding process of Seq2Seq, a word corresponding to the maximum value of the word probability distribution at each moment is often output, and the algorithm is a greedy search algorithm. However, greedy search has a certain limitation, because the output of each step of the decoder depends on the input of the previous step, if only the word with the highest probability is selected, the model is easily trapped in the local optimal solution, and the global optimal solution cannot be guaranteed. To alleviate this problem, it is conceivable to use a beam search algorithm, the core idea of which is to generate k most probable sequences at each decoding time step, followed by k | V at each subsequent step_bsSelecting k most likely sequences from the | candidates, where k and V_bsRespectively representing the size of the bundle and the vocabulary, | V_bs| represents the size of the vocabulary. As shown in table 4, 3 different values of k were selected for the experiment and compared to the greedy search. Obviously, the scores of the sequences decoded by the beam search algorithm are higher than the scores of the sequences decoded by the greedy search algorithm on each index, and the experimental results are better along with the increase of the k value, which fully shows that the beam search algorithm can enable a decoder to jump out of a local optimal solution and search the range of a solution space as much as possible so as to achieve the aim of obtaining the optimal solutionAnd (4) global optimization. On the other hand, although the model performance improves with increasing beam size k, the decoder decoding complexity increases.

TABLE 5 Experimental results of sample analysis of problem Generation System

Table 5 shows the experimental results of the sample analysis of the problem generation model, and the NQG + + and ASs2s models were selected as baseline models for this experiment. The answer label is "Engineering News-Record", which is the name of a magazine, so the question is asked using the "what" beginning. Both ASs2s and KBIDN generate a problem beginning with "what" and the semantics are close to ground route, proving the validity of the two models for simple input sentences. The problem generated by the KBIDN also comprises an important entity "index", and semantic information of the generated problem is further enriched. On the other hand, the NQG + + generated question begins with "in which", stating that it does not capture the true information of the answer label. It is noted that the answer label "engineering news-Record" is included in each of the three model-generated questions, further demonstrating the importance and effectiveness of the copy mechanism in the question generation system.

Parts of the invention not described in detail are well known in the art. The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the scope of the specific embodiments, and it is obvious to those skilled in the art that various changes are made within the spirit and scope of the present invention defined and determined by the claims, and all the inventions utilizing the inventive concept are protected.

Claims

1. A knowledge-graph and question-word-driven question generation system, comprising:

the text preprocessing module is used for preprocessing the text;

a knowledge matching module; to generate semantically more relevant questions;

2. A problem generation method based on knowledge graph and question word drive is characterized by comprising the following steps:

step (1), preprocessing a text;

step (2), constructing a one-hop knowledge graph based on the preprocessed text;

step (3), calculating a static graph attention vector based on a one-hop knowledge graph;

step (4), constructing a characteristic-enhanced encoder, namely firstly converting a one-hop knowledge map into a one-dimensional static word embedding vector, and then splicing the one-dimensional static word embedding vector with a word embedding vector of an input text, an answer position information vector of the input text and a vocabulary characteristic information vector to be used as net input of the encoder so as to enhance the capability of the encoder for acquiring context semantic information;

constructing a gating self-attention mechanism module to store additional context semantic information of the encoder part and further expand the context semantic of the input text;

step (6), constructing a decoder, which is used for decoding the intermediate state one-dimensional vector coded by the coder so as to output the final word probability distribution;

step (7), constructing a knowledge matching module; to generate semantically more relevant questions;

step (8), constructing a semantic search space matching module to calculate the semantic similarity of the question and the answer;

and (9) constructing a query word prediction module to predict the query words corresponding to the input text.

3. The method for generating question based on knowledge-graph and question word driving according to claim 2, characterized in that the step (1) of preprocessing the text comprises the following specific steps:

processing the text formats in a unified way, namely processing all texts, deleting redundant spaces before, after and in the middle of the texts, and removing non-English letter symbols; encoding each Word into a multi-dimensional Word embedding form by adopting Global Vectors for Word retrieval (GloVe) encoding, wherein the vocabulary size of the GloVe is selected as N_GThe unknown word is expressed as<UNK>。

4. The problem generation method based on knowledge graph and query term driving according to claim 2, wherein the step (2) of constructing a one-hop knowledge graph comprises the following specific steps:

selecting a ConceptNet large-scale common sense map as a knowledge base, searching a one-hop node in the common sense map for each word aiming at the input text, fixing the number of the nodes to be 60, adopting a bottom-bound triple NOT _ A _ FACT to represent the triple which is NOT matched with any entity, obtaining the one-hop knowledge map formed by the triples, and simultaneously keeping a copy of the one-hop knowledge map.

5. The method of claim 2, wherein the step (3) of calculating a one-hop-based histogram-based static graph attention vector, matching each word in the input sentence to its corresponding multi-hop graph, and converting the multi-hop graph into a corresponding static graph attention vector for input into the encoder structure, let K ═ { K ═ K { (K) } K₁,...,k_|K|The triple vector of the knowledge map is represented as k_i＝(h_i,r_i,t_i) Where, | K | represents the number of triples in the set K, h_i、r_iAnd t_iAre respectively triplets k_iCorresponding head, relation and tail vectors, i ∈ [1, | K](ii) a In order to obtain the graph embedding vector corresponding to the one-hop knowledge graph, a one-hop knowledge graph set is obtained firstly

Wherein x is an input sequence; | x | is the length of the input sequence; one represents the relevant symbol of the one-hop knowledge-graph,

one-hop knowledge graph at time t

The corresponding three tuple sets are

Wherein the content of the first and second substances,

is a set

The number of elements contained in the steel;

is provided with

And

j-th triplet at respective time t

The included head, relationship and tail vectors; calculating the final one-jump static map attention vector g at the time t by the following formula_t：

6. The method for generating question based on knowledge-graph and question-word driving as claimed in claim 2, wherein the step (4) of constructing feature-enhanced encoder, the encoder uses a bidirectional Long Short-Term Memory (LSTM) for encoding, and the calculation is as follows:

wherein the content of the first and second substances,

and

hidden layer vectors respectively representing positive direction decoding and reverse direction decoding of the bidirectional LSTM at the time t; enc is the mark of the encoder, the same as below;

7. The knowledge-graph-and-query-word-driven question generation method according to claim 2, wherein said step (5) of constructing a gated self-attention mechanism module first obtains a coder hidden layer vector matrix

Wherein the content of the first and second substances,

representing a two-dimensional set of real number fieldsThe dimension is | x | × 2 d; d is the dimension of the LSTM hidden layer state; | x | is the length of the input sequence x;

is provided with

[；]Is a vector concatenator; then, a self-attention moment array is obtained by a self-attention algorithm

Wherein the content of the first and second substances,

and

respectively representing matrices, fusions, derived from attention algorithmsThe original matrix and the matrix obtained after the attention algorithm and the matrix finally obtained by the gating mechanism, and

8. A knowledge-graph-and-query-word-driven question generation method according to claim 2, characterized in that said step (6) of constructing a decoder part, the decoder part being constituted by another LSTM, as follows:

wherein the content of the first and second substances,

is the hidden layer vector of the decoder at the moment t; dec is a correlation symbol representing the decoder; y is_t-1Represents the output vector of the decoder at time t-1;

a gated self-attention vector for encoder time j; softmax (-) and τ (-) are the Softmax function and the bilinear attention function, respectively; [;]is a vector connector;

a copy mechanism is introduced in the decoding stage to avoid that the decoder ignores some important low frequency words, i.e. the decoder generates two distributions:

P(o_t)＝(1-μ_t)P_v(o_t＝w_v)+μ_tP_c(o_t＝w_c)

wherein, mu_tA threshold value between 0 and 1; sigma (·), MLP (·) and copy (·) are respectively a Sigmoid function, a multilayer perceptron function and a copy mechanism function; o_tIs the output value at the time t; w is a_vAnd w_cRespectively representing words generated from a vocabulary and generated according to a copy mechanism; subscripts v and c represent bar identifiers generated from the vocabulary and according to the copy mechanism, respectively; [;]is a vector connector;

the loss function of the decoder is represented by:

9. The knowledge-graph-and-query-driven question generation method according to claim 2, characterized in that said step (7) of constructing a knowledge matching module to generate semantically more relevant questions; given set of knowledge-graph fact triplets vectors

And is

Has f_i＝(h_i,r_i,t_i) Wherein, | Γ | is the number of elements in the triple set Γ; h is_i、r_iAnd t_iAre triplets f respectively_iCorresponding head, relation and tail vectors; h is_iAnd t_iHas a dimension of d_e,r_iHas a dimension of d_rThus, the set Γ may constitute a knowledge matrix

In the coder-decoder structure, the hidden layer state of the coder is regarded as prior information, the hidden layer state of the decoder is regarded as posterior knowledge, and the knowledge matching module calculates the distribution of two kinds of fused external information, namely prior distribution Zeta_priorAnd posterior distribution ζ_postIf the dimensions of the two distributions are | Γ |, the ith dimensions of the two distributions both represent the attention degree of the model to the ith fact triple;

attention vector for decoder y time instant;

L_k＝JS(ζ_prior||ζ_post)

10. The method for generating question based on knowledge-graph and question-word driving according to claim 2, wherein said step (8) is to construct a semantic search space matching module to calculate semantic similarity between question and answer, and the parameter matrix W obtained in step (7) is_FAnd W_postContains a posteriori information, i.e., semantic information related to the question; designing two mapping functions;

and

the space vectors are searched for the semantics corresponding to the question and the answer, respectively, and in addition, there are:

wherein the content of the first and second substances,

is L-2 norm;

finally, calculation was done by Kullback-Leibler Divergence (KL Divergence)

And

11. The question generating method based on knowledge graph and query term driving according to claim 2, characterized in that, the step (9) of constructing a query term prediction module to predict the query terms corresponding to the input text, wherein the query term prediction module is composed of a classifier, the input of the classifier is an answer label sequence, and a TextCNN model is used as the classifier of the query terms; given answer label sequence

P_TextCNN(u_t＝s_k)＝Softmax(TextCNN(x_ans))

wherein u is_tAs output of the classifier, s_kIs the label value in the classified label set; TextCNN (·) is a TextCNN network function;

finally, the cross entropy is used as the loss function of the module:

is the answer x_ansA corresponding real query word label;

the final loss function is calculated by:

L＝λ_q·L_q+λ_k·L_k+λ_s·L_s+λ_r·L_r