CN114168749A - Question generation system based on knowledge graph and question word drive - Google Patents

Question generation system based on knowledge graph and question word drive Download PDF

Info

Publication number
CN114168749A
CN114168749A CN202111475261.5A CN202111475261A CN114168749A CN 114168749 A CN114168749 A CN 114168749A CN 202111475261 A CN202111475261 A CN 202111475261A CN 114168749 A CN114168749 A CN 114168749A
Authority
CN
China
Prior art keywords
vector
knowledge
attention
word
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111475261.5A
Other languages
Chinese (zh)
Inventor
荣文戈
周世杰
欧阳元新
熊璋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202111475261.5A priority Critical patent/CN114168749A/en
Publication of CN114168749A publication Critical patent/CN114168749A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a problem generation system based on knowledge graph and question word drive, comprising: the text preprocessing module is used for preprocessing the text; the one-hop knowledge map construction module is used for constructing a one-hop knowledge map based on the preprocessed text; the attention vector calculation module is used for calculating a static map attention vector based on a one-hop knowledge map; a feature enhanced encoder; the gated attention mechanism module is used for storing additional context semantic information of the encoder part and further expanding the context semantic of the input text; the decoder is used for decoding the intermediate state one-dimensional vector coded by the coder so as to output final word probability distribution; a knowledge matching module; the semantic search space matching module is used for calculating the semantic similarity of the question and the answer; and the query word prediction module is used for predicting the query words corresponding to the input text.

Description

Question generation system based on knowledge graph and question word drive
Technical Field
The invention belongs to the technical field of natural language processing research, and particularly relates to a problem generation system based on knowledge map and question word drive.
Background
In recent years, with the tremendous increase of computer hardware computing power and the deep progress of deep learning research, the natural language generation technology has greatly progressed. The problem Generation technology has also achieved certain results as one of the most important links of Natural Language Generation (NLG), and a number of data-driven deep learning models have been created. With the popularization of artificial intelligence application, people have stronger and stronger cravings for man-machine question answering, and a problem generation system is the most complex and extremely challenging ring in artificial intelligence, especially natural language processing. On the one hand, the generated questions must be able to catch the topic of the question and answer and the relevant facts, and on the other hand, the model generated questions must be highly rich and diverse to ensure a high quality user experience.
Knowledge maps have been shown to greatly improve the performance of Natural Language Processing (NLP) models. In the daily chat or conversation process, the question raising is a very common scenario. Therefore, generating a proper and meaningful question is critical to the automated question and answer technology. The question generation plays an extremely important role in the question and answer task, aims to generate questions related to the text according to the given input text, and is widely applied to the fields of question and answer systems, dialog systems, chat robots and the like. In daily chatting, a problem is thrown, and the topic of the chatting can be determined, so that subsequent conversations can be better carried out; in a search engine, people often input a question, expecting to obtain relevant answers and retrieve contents; in the intelligent customer service system, the system can automatically generate the problems associated with the keywords input by the user, and provides the user for searching, thereby greatly improving the customer service efficiency. In recent years, various problem generation models have been proposed by many scholars, however, the problem of semantic mismatch, especially the question words of the problem, still occurs. Whether the query word is correct or not directly determines whether the semantics of a question are clear and definite. For example, for The place "The Forbidden City", The generated problem needs to begin with "where", otherwise, The problems of unclear semantics and no ambiguity occur, and The user experience and The model performance are seriously affected. On the other hand, whether the semantics of the question are rich is also one of the important factors for determining the quality of the question generation model. In the question-answer scenario, both the question and the answer are often discussing something, and have a certain relevance, for example, for the answer "Ilike applets best all", the question is often asked around "front". Thus, fusing knowledge into the question generation model may expand the semantics of the input text to generate a higher quality question.
The difficulty of current research in the field of problem generation is mainly that: 1) the semantic meaning of the generated problem is not rich enough, and the problem of boring and boring is often generated; 2) models are prone to generating problems that are not sufficiently close or wrong, such as wrong query words, semantically irrelevant problems, etc., causing ambiguity or misunderstanding. In conclusion, the introduction of external knowledge and predictive query words into the problem generation system has certain prospects, so that the direction is selected as the research focus of the invention.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the problem that problems generated in a traditional problem generation system based on a neural network are too general and easy to separate from problems is solved, and meanwhile, semantic information of context is enhanced and the generated semantics are richer is solved. The performance of the model is improved by a one-hop-based graph attention machine mechanism and three auxiliary tasks, and meanwhile the prediction accuracy of the questioning words is improved. The first core point of the question generation system is how to generate questions in accordance with the question-answer semantics. Compared with other existing problem generation system models, the problem generated by the method is richer in semantics and closer to question and answer facts. Other problem generation systems often generate relatively boring, general or off-topic problems, which greatly reduces user experience. The second core point of the problem generation system is how to enhance the context semantics, and the enhancement of the context semantics can make the problem generated by the model more real. Therefore, the semantic information of the context is enhanced by introducing a one-hop knowledge graph structure, a knowledge matching module and a semantic search space matching module, and finally the purpose of improving the system performance is achieved.
The system jointly learns the problem generation task under the multi-task learning framework and outputs the final result. Specifically, the system is designed from four aspects under the framework of multitask learning: firstly, a gating self-attention mechanism is designed, and the mechanism can dynamically and self-adaptively acquire the semantic information of the context, so that the encoding performance of an encoder is improved; secondly, a separate auxiliary task, namely a knowledge matching module, is designed to promote the model to be capable of paying attention to the fact information most relevant to the question and answer; thirdly, a semantic search space matching module is constructed to shorten the distance between the question and the answer in the semantic search space; fourthly, a questioning word predicting module is constructed, which aims to predict and output the questioning words corresponding to the problems, and further improves the quality of the generated problems. Through a multi-task learning mechanism, the invention can generate more appropriate problems.
The technical scheme for solving the technical problems comprises the following steps: a knowledge-graph and question-word-driven question generation system, comprising:
the text preprocessing module is used for preprocessing the text;
the one-hop knowledge map construction module is used for constructing a one-hop knowledge map based on the preprocessed text;
the attention vector calculation module is used for calculating a static map attention vector based on a one-hop knowledge map;
the characteristic-enhanced encoder is characterized in that a one-hop knowledge map is converted into a one-dimensional static word embedding vector, and then the one-dimensional static word embedding vector is spliced with a word embedding vector of an input text, an answer position information vector of the input text and a vocabulary characteristic information vector to serve as net input of the encoder so as to enhance the capability of the encoder for acquiring context semantic information;
the gated attention mechanism module is used for storing additional context semantic information of the encoder part and further expanding the context semantic of the input text;
the decoder is used for decoding the intermediate state one-dimensional vector coded by the coder so as to output final word probability distribution;
a knowledge matching module; to generate semantically more relevant questions;
the semantic search space matching module is used for calculating the semantic similarity of the question and the answer;
and the query word prediction module is used for predicting the query words corresponding to the input text.
According to another aspect of the invention, a problem generation method based on knowledge graph and question word drive is provided, and the method comprises the following specific steps:
the method comprises the following steps of (1) preprocessing a text, and specifically comprises the following steps:
and uniformly processing the text formats, namely processing all texts, deleting redundant spaces before, after and in the middle, and removing non-English letter symbols. Encoding each word into a 300-dimensional word embedding form by using GlobalVectors for WordRepression (GloVe) encoding, wherein the vocabulary size of the GloVe is selected to be 30000, and the unknown word is expressed as < UNK >.
Step (2), constructing a one-hop knowledge graph, which comprises the following specific steps:
selecting a concept net large-scale common sense graph as a knowledge base, searching one-hop nodes in the common sense graph aiming at each word of an input text, wherein the number of the nodes is fixed to be 60. A bottom triplet NOT _ a _ FACT is used to represent triples that do NOT match to any entity. In this way, a one-hop knowledge graph is obtained that is composed of triplets. At the same time, a copy of the one-hop knowledge-graph is retained.
And (3) calculating the attention vector of the static graph based on the one-hop knowledge graph. Each word in the input sentence is matched to its corresponding multi-hop map and the multi-hop map is converted to a corresponding static map attention vector for input into the encoder structure. Let K be K ═ K1,...,k|K|The triple vector of the knowledge map is represented as ki=(hi,ri,ti) Where, | K | represents the number of triples in the set K, hi、riAnd tiAre respectively triplets kiCorresponding head, relation and tail vectors, i ∈ [1, | K]. In order to obtain the graph embedding vector corresponding to the one-hop knowledge graph, a one-hop knowledge graph set is obtained firstly
Figure BDA0003393414440000031
Wherein x is an input sequence; | x | is the length of the input sequence; one represents the relative symbol of the one-hop knowledge-graph, as follows.
Figure BDA0003393414440000032
One-hop knowledge graph at time t
Figure BDA0003393414440000033
The corresponding three tuple sets are
Figure BDA0003393414440000034
Wherein the content of the first and second substances,
Figure BDA0003393414440000035
is a set
Figure BDA0003393414440000036
The number of elements contained in the steel;
Figure BDA0003393414440000037
is provided with
Figure BDA0003393414440000041
And
Figure BDA0003393414440000042
j-th triplet at respective time t
Figure BDA0003393414440000043
The included head, relationship and tail vectors. Therefore, the final one-hop static map attention vector g at time t can be calculated by the following formulat
Figure BDA0003393414440000044
Figure BDA0003393414440000045
Wherein, gtRepresenting the attention vector of the one-jump static graph corresponding to the input at the time t; alpha is alphatiThe attention scores of the 0-hop entity and the ith one-hop entity input at the time t; exp (·) is an exponential function with a natural constant e as base; τ (-) is a bilinear attention function; [;]is the concatenation symbol of the vector.
And (4) constructing a characteristic enhanced encoder. The encoder uses a bidirectional Long Short-Term Memory (LSTM) for encoding, and the calculation is as follows:
Figure BDA0003393414440000046
Figure BDA0003393414440000047
Figure BDA0003393414440000048
wherein the content of the first and second substances,
Figure BDA0003393414440000049
and
Figure BDA00033934144400000410
respectively representing hidden layer vectors of the LSTM at the t moment; enc is the mark of the encoder, the same as below;
Figure BDA00033934144400000411
the input vector after splicing; x is the number oft、gt、mtAnd ltRespectively representing a word embedding vector at the time t, a corresponding one-hop static graph attention vector, an answer position information vector and a vocabulary characteristic information vector.
And (5) constructing a gated self-attention mechanism module. The invention designs a gating self-attention mechanism, aiming at enhancing the capability of an encoder for acquiring context semantic information. Firstly, a vector matrix of a hidden layer of an encoder is obtained
Figure BDA00033934144400000412
Wherein the content of the first and second substances,
Figure BDA00033934144400000413
d is the dimension of the LSTM hidden layer state; | x | is the length of the input sequence x;
Figure BDA00033934144400000414
is provided with
Figure BDA00033934144400000415
[;]Are vector concatenators. Then, a self-attention moment array is obtained by a self-attention algorithm
Figure BDA00033934144400000416
Finally, a gate control unit controls the finally generated self-attention moment array
Figure BDA00033934144400000417
Figure BDA00033934144400000418
Figure BDA00033934144400000419
Figure BDA00033934144400000420
Figure BDA0003393414440000051
Wherein the content of the first and second substances,
Figure BDA0003393414440000052
and
Figure BDA0003393414440000053
respectively representing a matrix obtained from the attention algorithm, a matrix obtained after fusing the original matrix and the attention algorithm, and a matrix finally obtained through a gating mechanism, and
Figure BDA0003393414440000054
| x | is the length of the input sequence x; q, K and V are both state parameter matrices calculated from the attention scores; softmax (-), tanh (-), MLP (-), and σ (-), represent the Softmax function, tanh function, the multi-layer perceptron function, and the Sigmoid function, respectively;
Figure BDA0003393414440000055
a connection symbol representing a matrix; an indicator indicating a multiplication of a corresponding position of the matrix; j is a full 1 matrix, and the matrix dimension is consistent with H.
And (6) constructing a decoder part. The invention designs a decoder based on attention mechanism, which is composed of another LSTM, as follows:
Figure BDA0003393414440000056
Figure BDA0003393414440000057
Figure BDA0003393414440000058
wherein the content of the first and second substances,
Figure BDA0003393414440000059
is the hidden layer vector of the decoder at the moment t; dec is a correlation symbol representing the decoder, the same applies below; y ist-1Represents the output vector of the decoder at time t-1;
Figure BDA00033934144400000510
an attention vector representing time t; beta is atjRepresenting the attention score between the hidden layer vector at the moment t of the decoder and the jth input sequence of the encoder;
Figure BDA00033934144400000511
a gated self-attention vector for encoder time j; softmax (-) and τ (-) are the Softmax function and the bilinear attention function, respectively; [;]are vector connectors.
A copy mechanism is introduced at the decoding stage to avoid that the decoder ignores some important low frequency words. I.e. the decoder generates two distributions:
Figure BDA00033934144400000512
Figure BDA00033934144400000513
Figure BDA00033934144400000514
P(ot)=(1-μt)Pv(ot=wv)+μtPc(ot=wc)
wherein, mutA threshold value between 0 and 1; sigma (·), MLP (·) and copy (·) are respectively a Sigmoid function, a multilayer perceptron function and a copy mechanism function; otIs the output value at the time t; w is avAnd wcRespectively representing words generated from a vocabulary and generated according to a copy mechanism; subscripts v and c represent bar identifiers generated from the vocabulary and according to the copy mechanism, respectively; [;]are vector connectors.
The loss function of the decoder can be expressed by:
Figure BDA0003393414440000061
wherein L isqIs a loss function of the decoder; log (-) is a logarithmic function with a natural constant e as a true number; y istIs the output value of the decoder at the moment t; | y | is the length of the decoder output sequence; x is an input sequence; Γ is the set of knowledge-graph fact triple vectors.
And (7) constructing a knowledge matching module. The invention designs an additional knowledge matching module to generate a problem with more relevant semantics. In a question-and-answer scenario, questions and answers tend to have strong semantic relevance and both expand based on certain facts. Given set of knowledge-graph fact triplets vectors
Figure BDA0003393414440000062
And is
Figure BDA0003393414440000063
Has fi=(hi,ri,ti) Wherein, | Γ | is the number of elements in the triple set Γ; h isi、riAnd tiAre triplets f respectivelyiCorresponding head, relation and tail vectors; h isiAnd tiHas a dimension of de,riHas a dimension of dr. Thus, the set Γ may constitute oneKnowledge matrix
Figure BDA0003393414440000064
In the encoder-decoder architecture, the hidden layer state of the encoder can be regarded as prior information, and the hidden layer state of the decoder can be regarded as posterior knowledge, for which, the knowledge matching module calculates two distributions fusing external information, namely prior distribution zetapriorAnd posterior distribution ζpostAnd the dimensions of the two distributions are both | Γ |, the ith dimension of the two distributions represents the attention degree of the model to the ith fact triple.
Figure BDA0003393414440000065
Figure BDA0003393414440000066
Wherein, Softmax (-) and tanh (-) are Softmax function and tanh function respectively; wF、WpriorAnd WpostAre all parameter matrices; [;]is a vector connector; an indicator indicating a multiplication of a corresponding position of the matrix; | x | is the length of the input sequence x; | y | is the length of the decoder output sequence y;
Figure BDA0003393414440000067
is the attention vector at decoder y instant.
And finally, taking Jensen-Shannon Divergence (JS Divergence) of the prior distribution and the posterior distribution as a loss function of the knowledge matching module:
Lk=JS(ζprior||ζpost)
wherein L iskA loss function for the knowledge matching module; JS (| |) is the JS divergence function.
And (8) constructing a semantic search space matching module. The invention designs an additional auxiliary task, namely a semantic search space matching module, so as to calculate the semantic similarity of the question and the answer. The parameter matrix W obtained in step (7)FAnd WpostContains a large amount of a posteriori information, i.e. semantic information related to the problem. For this purpose, the invention designs two mapping functions
Figure BDA0003393414440000068
And ψ (-) to project the question vector and answer vector to the corresponding semantic search spaces, respectively:
Figure BDA0003393414440000071
Figure BDA0003393414440000072
Figure BDA0003393414440000073
wherein e isaveThe average value of all word embedding vector splicing vectors of the word embedding layer is obtained; x is the number oft、gt、mtAnd ltRespectively representing a word embedding vector at the time t, a corresponding one-hop static graph attention vector, an answer position information vector and a vocabulary characteristic information vector; ws、WFAnd WpostAre all parameter matrices;
Figure BDA0003393414440000074
and
Figure BDA0003393414440000075
the space vectors are searched for the semantics corresponding to the question and the answer, respectively. In addition, there are:
Figure BDA0003393414440000076
Figure BDA0003393414440000077
wherein the content of the first and second substances,
Figure BDA0003393414440000078
and ψ (·) is a mapping function for the question and answer sequences, respectively;
Figure BDA0003393414440000079
is L-2 norm.
Finally, calculation was done by Kullback-Leibler Divergence (KL Divergence)
Figure BDA00033934144400000710
And
Figure BDA00033934144400000711
distance in the projected semantic search space and as a loss function of this module:
Figure BDA00033934144400000712
wherein L issLoss function of this module, DKL(| |) is the KL divergence function.
And (9) constructing a query word prediction module. The invention designs an additional auxiliary task, namely a query word prediction module, so as to predict the query words corresponding to the input text. The query word prediction module is composed of a classifier, the input of the classifier is an answer label sequence, and a TextCNN model is adopted as the classifier of the query words. Given answer label sequence
Figure BDA00033934144400000713
Wherein x isaA starting word representing a sequence of answer labels; | xansIf | represents the length of the answer label sequence, the output probability distribution of the predicted query word is calculated as follows:
PTextCNN(ut=sk)=Softmax(TextCNN(xans))
wherein u istAs output of the classifier, skTo classifyA tag value in a set of tags; TextCNN (·) is a TextCNN network function.
Finally, the cross entropy is used as the loss function of the module:
Figure BDA00033934144400000714
wherein Lr is a loss function of the module; log (-) is a logarithmic function with a natural constant e as a true number;
Figure BDA0003393414440000081
is the answer xansCorresponding real query word tags.
To this end, the final loss function of the present invention can be calculated by:
L=λq·Lqk·Lks·Lsr·Lr
wherein L is the final loss function; lambda [ alpha ]q、λk、λsAnd λrAre respectively Lq、Lk、LsAnd LrThe weight coefficient of (2).
Compared with the prior art, the invention has the advantages that:
1. the invention is optimized from two aspects. In one aspect, to improve the ability of the encoder to obtain context information, the model employs a feature-enhanced encoder. Meanwhile, a gated self-attention mechanism is also provided to further acquire the context semantic information of the input text. The self-attention vector can be regarded as an additional memory network for storing rich contextual information to enhance contextual semantic information of the input text. On the other hand, the model extracts knowledge triplets from the input text and vectorizes them as knowledge fusion embedding layers to further expand the context of the encoding stage. Meanwhile, the invention constructs 3 independent subtask modules which are respectively a knowledge matching module, a semantic search space matching module and a questioning word prediction module. The core idea of the knowledge matching module is to improve the capability of the model to pay attention to the fact related to the input text, and promote the decoder to generate the problem that the semantics are more relevant by introducing an external knowledge spectrogram and the posterior information of a decoding stage. The semantic search space matching module calculates the semantic matching degree between the text and the output problem, and performs combined training by taking the semantic matching degree as a part of a loss function, so that the semantic association degree between the generated problem and the input text is improved, and the generation independence problem is further avoided. Finally, the questioning word predicting module constructs a multitask classifier introducing external knowledge so as to improve the precision rate of predicting the questioning words.
In order to combine the 3 auxiliary subtasks proposed above, i.e., the knowledge matching module, the semantic search space matching module, and the query word prediction module, with the encoder-decoder architecture as the main task, the present invention adopts a multi-task learning strategy to calculate the loss functions of all four tasks, respectively, and takes the weighted and summed loss function as the final loss function of the model.
2. In the decoding stage, the invention strategically proposes a gating mechanism, and the finally generated words can be generated from two places, namely a word stock and an input text. Through a gating mechanism, the cumulative distribution of the two distributions is calculated, so that the finally generated word distribution is more accurate, the semantic information of the context can be captured more accurately, the model performance is greatly improved, and the user experience is improved. The problem of final generation is that the non-high frequency words are not easy to generate from the word list, so the invention obtains the probability of the output words from the word list through a copy mechanism, thereby leading the generated problem semantics to be richer and more appropriate, and simultaneously having a certain information content.
Drawings
FIG. 1 is a system overview chart generated based on knowledge-graph and question-word driven questions;
FIG. 2 is a schematic diagram of a model of a projection of a question vector and an answer vector to a semantic search space;
FIG. 3 is a schematic diagram of a module model for predicting interrogative words.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention without creative efforts.
The invention relates to a question generation system based on knowledge graph and question word drive, and a system model is shown in figure 1. The model is trained under a multi-task learning framework, a main task module is a Seq2Seq framework based on an attention mechanism, and meanwhile, 3 independent auxiliary task modules are introduced, wherein the independent auxiliary task modules are respectively as follows: 1) a knowledge matching module; 2) a semantic search space matching module; 3) a query prediction module. The knowledge matching module fuses knowledge information into hidden layer vectors of an encoder and a decoder, and calculates two distributions, namely a priori distribution and a posterior distribution, so that a model can focus on the knowledge information which is more appropriate to a question and answer scene; the core idea of the semantic space searching module is to reduce the distance between the question vector and the answer vector in the projected semantic searching space so as to ensure that the generated question is semantically as close as possible to the answer; the query word prediction module constructs a simple classifier to predict the query words of the problem, and avoids generating a grammar error or an improper problem.
The method makes full use of the characteristics of a multi-task learning framework, and introduces 3 additional auxiliary tasks for improving the performance of the model. By introducing the knowledge graph and constructing the knowledge graph with one hop, the finally generated problem is more appropriate and the semantic information is richer.
The method firstly cuts the large-scale knowledge graph into a one-jump knowledge graph, and calculates the attention vector of the static graph through the attention mechanism of the static graph, wherein the vector is used for enhancing the semantic information of the input text. And the data is trained by combining with a multi-task learning framework, and the final experimental result of the invention is obviously superior to that of the existing problem generation system.
According to one embodiment of the invention, a system for generating questions based on knowledge-graph and query word driving comprises:
the text preprocessing module is used for preprocessing the text;
the one-hop knowledge map construction module is used for constructing a one-hop knowledge map based on the preprocessed text;
the attention vector calculation module is used for calculating a static map attention vector based on a one-hop knowledge map;
the characteristic-enhanced encoder is characterized in that a one-hop knowledge map is converted into a one-dimensional static word embedding vector, and then the one-dimensional static word embedding vector is spliced with a word embedding vector of an input text, an answer position information vector of the input text and a vocabulary characteristic information vector to serve as net input of the encoder so as to enhance the capability of the encoder for acquiring context semantic information;
the gated attention mechanism module is used for storing additional context semantic information of the encoder part and further expanding the context semantic of the input text;
the decoder is used for decoding the intermediate state one-dimensional vector coded by the coder so as to output final word probability distribution;
a knowledge matching module; to generate semantically more relevant questions;
the semantic search space matching module is used for calculating the semantic similarity of the question and the answer;
and the query word prediction module is used for predicting the query words corresponding to the input text.
According to another embodiment of the invention, a knowledge-graph and query word-driven question generation method is provided, which comprises the following steps:
the method comprises the following steps of (1) preprocessing a text, and specifically comprises the following steps:
and uniformly processing the text formats, namely processing all texts, deleting redundant spaces before, after and in the middle, and removing non-English letter symbols. Encoding each word into a 300-dimensional word embedding form by using GlobalVectors for WordRepression (GloVe) encoding, wherein the vocabulary size of the GloVe is selected to be 30000, and the unknown word is expressed as < UNK >.
Step (2), constructing a one-hop knowledge graph, which comprises the following specific steps:
selecting a concept net large-scale common sense graph as a knowledge base, searching one-hop nodes in the common sense graph aiming at each word of an input text, wherein the number of the nodes is fixed to be 60. A bottom triplet NOT _ a _ FACT is used to represent triples that do NOT match to any entity. In this way, a one-hop knowledge graph is obtained that is composed of triplets. At the same time, a copy of the one-hop knowledge-graph is retained.
And (3) calculating the attention vector of the static graph based on the one-hop knowledge graph. Each word in the input sentence is matched to its corresponding multi-hop map and the multi-hop map is converted to a corresponding static map attention vector for input into the encoder structure. Let K be K ═ K1,...,k|K|The triple vector of the knowledge map is represented as ki=(hi,ri,ti) Where, | K | represents the number of triples in the set K, hi、riAnd tiAre respectively triplets kiCorresponding head, relation and tail vectors, i ∈ [1, | K]. In order to obtain the graph embedding vector corresponding to the one-hop knowledge graph, a one-hop knowledge graph set is obtained firstly
Figure BDA0003393414440000101
Wherein x is an input sequence; | x | is the length of the input sequence; one represents the relative symbol of the one-hop knowledge-graph, as follows.
Figure BDA00033934144400001012
One-hop knowledge graph at time t
Figure BDA0003393414440000102
The corresponding three tuple sets are
Figure BDA0003393414440000103
Wherein the content of the first and second substances,
Figure BDA0003393414440000104
is a set
Figure BDA0003393414440000105
The number of elements contained in the steel;
Figure BDA0003393414440000106
is provided with
Figure BDA0003393414440000107
And
Figure BDA0003393414440000108
j-th triplet at respective time t
Figure BDA0003393414440000109
The included head, relationship and tail vectors. Therefore, the final one-hop static map attention vector g at time t can be calculated by the following formulat
Figure BDA00033934144400001010
Figure BDA00033934144400001011
Wherein, gtRepresenting the attention vector of the one-jump static graph corresponding to the input at the time t; alpha is alphatiThe attention scores of the 0-hop entity and the ith one-hop entity input at the time t; exp (·) is an exponential function with a natural constant e as base; τ (-) is a bilinear attention function; [;]is the concatenation symbol of the vector.
And (4) constructing a characteristic enhanced encoder. The encoder uses a bidirectional Long Short-Term Memory (LSTM) for encoding, and the calculation is as follows:
Figure BDA0003393414440000111
Figure BDA0003393414440000112
Figure BDA0003393414440000113
wherein the content of the first and second substances,
Figure BDA0003393414440000114
and
Figure BDA0003393414440000115
respectively representing hidden layer vectors of the LSTM at the t moment; enc is the mark of the encoder, the same as below;
Figure BDA0003393414440000116
the input vector after splicing; x is the number oft、gt、mtAnd ltRespectively representing a word embedding vector at the time t, a corresponding one-hop static graph attention vector, an answer position information vector and a vocabulary characteristic information vector.
And (5) constructing a gated self-attention mechanism module. The invention designs a gating self-attention mechanism, aiming at enhancing the capability of an encoder for acquiring context semantic information. Firstly, a vector matrix of a hidden layer of an encoder is obtained
Figure BDA0003393414440000117
Wherein the content of the first and second substances,
Figure BDA0003393414440000118
d is the dimension of the LSTM hidden layer state; | x | is the length of the input sequence x;
Figure BDA0003393414440000119
is provided with
Figure BDA00033934144400001110
[;]Are vector concatenators. Then, a self-attention moment array is obtained by a self-attention algorithm
Figure BDA00033934144400001111
Finally, the process is carried out in a batch,controlling the finally generated self-attention moment array through a gate control unit
Figure BDA00033934144400001112
Figure BDA00033934144400001113
Figure BDA00033934144400001114
Figure BDA00033934144400001115
Figure BDA00033934144400001116
Wherein the content of the first and second substances,
Figure BDA00033934144400001117
and
Figure BDA00033934144400001118
respectively representing a matrix obtained from the attention algorithm, a matrix obtained after fusing the original matrix and the attention algorithm, and a matrix finally obtained through a gating mechanism, and
Figure BDA00033934144400001119
| x | is the length of the input sequence x; q, K and V are both state parameter matrices calculated from the attention scores; softmax (-), tanh (-), MLP (-), and σ (-), represent the Softmax function, tanh function, the multi-layer perceptron function, and the Sigmoid function, respectively; o denotes a connection symbol of the matrix; an indicator indicating a multiplication of a corresponding position of the matrix; j is a full 1 matrix, and the matrix dimension is consistent with H.
And (6) constructing a decoder part. The invention designs a decoder based on attention mechanism, which is composed of another LSTM, as follows:
Figure BDA0003393414440000121
Figure BDA0003393414440000122
Figure BDA0003393414440000123
wherein the content of the first and second substances,
Figure BDA0003393414440000124
is the hidden layer vector of the decoder at the moment t; dec is a correlation symbol representing the decoder, the same applies below; y ist-1Represents the output vector of the decoder at time t-1;
Figure BDA0003393414440000125
an attention vector representing time t; beta is atjRepresenting the attention score between the hidden layer vector at the moment t of the decoder and the jth input sequence of the encoder;
Figure BDA0003393414440000126
a gated self-attention vector for encoder time j; softmax (-) and τ (-) are the Softmax function and the bilinear attention function, respectively; [;]are vector connectors.
A copy mechanism is introduced at the decoding stage to avoid that the decoder ignores some important low frequency words. I.e. the decoder generates two distributions:
Figure BDA0003393414440000127
Figure BDA0003393414440000128
Figure BDA0003393414440000129
P(ot)=(1-μt)Pv(ot=wv)+μtPc(ot=wc)
wherein, mutA threshold value between 0 and 1; sigma (·), MLP (·) and copy (·) are respectively a Sigmoid function, a multilayer perceptron function and a copy mechanism function; otIs the output value at the time t; w is avAnd wcRespectively representing words generated from a vocabulary and generated according to a copy mechanism; subscripts v and c represent bar identifiers generated from the vocabulary and according to the copy mechanism, respectively; [;]are vector connectors.
The loss function of the decoder can be expressed by:
Figure BDA00033934144400001210
wherein L isqIs a loss function of the decoder; log (-) is a logarithmic function with a natural constant e as a true number; y istIs the output value of the decoder at the moment t; | y | is the length of the decoder output sequence; x is an input sequence; Γ is the set of knowledge-graph fact triple vectors.
And (7) constructing a knowledge matching module. The invention designs an additional knowledge matching module to generate a problem with more relevant semantics. In a question-and-answer scenario, questions and answers tend to have strong semantic relevance and both expand based on certain facts. Given set of knowledge-graph fact triplets vectors
Figure BDA00033934144400001211
And is
Figure BDA00033934144400001212
Has fi=(hi,ri,ti) Wherein, | Γ | is the number of elements in the triple set Γ; h isi、riAnd tiAre triplets f respectivelyiCorresponding head, relation and tail vectors; h isiAnd tiHas a dimension of de,riHas a dimension of dr. Thus, the set Γ may constitute a knowledge matrix
Figure BDA0003393414440000131
In the encoder-decoder architecture, the hidden layer state of the encoder can be regarded as prior information, and the hidden layer state of the decoder can be regarded as posterior knowledge, for which, the knowledge matching module calculates two distributions fusing external information, namely prior distribution zetapriorAnd posterior distribution ζpostAnd the dimensions of the two distributions are both | Γ |, the ith dimension of the two distributions represents the attention degree of the model to the ith fact triple.
Figure BDA0003393414440000132
Figure BDA0003393414440000133
Wherein, Softmax (-) and tanh (-) are Softmax function and tanh function respectively; wF、WpriorAnd WpostAre all parameter matrices; [;]is a vector connector; an indicator indicating a multiplication of a corresponding position of the matrix; | x | is the length of the input sequence x; | y | is the length of the decoder output sequence y;
Figure BDA0003393414440000134
is the attention vector at decoder y instant.
And finally, taking Jensen-Shannon Divergence (JS Divergence) of the prior distribution and the posterior distribution as a loss function of the knowledge matching module:
Lk=JS(ζprior||ζpost)
wherein L iskA loss function for the knowledge matching module; JS (| |) is the JS divergence function.
And (8) constructing a semantic search space matching module. The invention designs an additional auxiliary task, namely a semantic search space matching module, so as to calculate the semantic similarity of the question and the answer. The parameter matrix W obtained in step (7)FAnd WpostContains a large amount of a posteriori information, i.e. semantic information related to the problem. For this purpose, the invention designs two mapping functions
Figure BDA0003393414440000135
And ψ (-) to project the question vector and the answer vector to the corresponding semantic search spaces, respectively, fig. 2 is a schematic diagram of a model in which the question vector and the answer vector are projected to the semantic search spaces:
Figure BDA0003393414440000136
Figure BDA0003393414440000137
Figure BDA0003393414440000138
wherein e isaveThe average value of all word embedding vector splicing vectors of the word embedding layer is obtained; x is the number oft、gt、mtAnd ltRespectively representing a word embedding vector at the time t, a corresponding one-hop static graph attention vector, an answer position information vector and a vocabulary characteristic information vector; ws、WFAnd WpostAre all parameter matrices;
Figure BDA0003393414440000139
and
Figure BDA00033934144400001310
the space vectors are searched for the semantics corresponding to the question and the answer, respectively. In addition, there are:
Figure BDA0003393414440000141
Figure BDA0003393414440000142
wherein the content of the first and second substances,
Figure BDA0003393414440000143
and ψ (·) is a mapping function for the question and answer sequences, respectively;
Figure BDA0003393414440000144
is L-2 norm.
Finally, calculation was done by Kullback-Leibler Divergence (KL Divergence)
Figure BDA0003393414440000145
And
Figure BDA0003393414440000146
distance in the projected semantic search space and as a loss function of this module:
Figure BDA0003393414440000147
wherein L issLoss function of this module, DKL(| |) is the KL divergence function.
And (9) constructing a query word prediction module. The present invention designs an additional auxiliary task, namely, a query word prediction module, to predict the query words corresponding to the input text, as shown in fig. 3. The query word prediction module is composed of a classifier, the input of the classifier is an answer label sequence, and a TextCNN model is adopted as the classifier of the query words. Given answer label sequence
Figure BDA0003393414440000148
Wherein x isaA starting word representing a sequence of answer labels; | xansIf | represents the length of the answer label sequence, the output probability distribution of the predicted query word is calculated as follows:
PTextCNN(ut=sk)=Softmax(TextCNN(xans))
wherein u istAs output of the classifier, skIs the label value in the classified label set; TextCNN (·) is a TextCNN network function.
Finally, the cross entropy is used as the loss function of the module:
Figure BDA0003393414440000149
wherein Lr is a loss function of the module; log (-) is a logarithmic function with a natural constant e as a true number;
Figure BDA00033934144400001410
is the answer xansCorresponding real query word tags.
To this end, the final loss function of the present invention can be calculated by:
L=λq·Lqk·Lks·Lsr·Lr
wherein L is the final loss function; lambda [ alpha ]q、λk、λsAnd λrAre respectively Lq、Lk、LsAnd LrThe weight coefficient of (2).
In the steps (2) and (3), a static attention vector based on a one-hop knowledge graph is introduced into a model encoder part, so that a decoder is promoted to generate a problem that semantic information is richer. Meanwhile, the attention vector of the one-hop static graph is introduced, so that the waste of computing resources and under-fitting caused by using a global large-scale knowledge graph can be avoided, the potential semantic information of sentences can be greatly captured, and the quality of a system model is improved.
In the step (4), the input layer of the encoder not only considers the word embedding vector, but also considers the attention vector of the one-jump static image, the position information of the word and the vocabulary characteristics, so that the context semantic characteristics of the input text are greatly enriched, and the encoder is promoted to obtain richer semantic information.
In the step (5), the encoder part adopts a gated self-attention mechanism, the self-attention mechanism can dynamically and adaptively acquire the semantic information of the context, and the key point is that the weight of the self-attention is dynamically calculated, so that the model can pay attention to different context words with a tendency according to the weights of the different weights, and the performance of the model for acquiring the context information is greatly improved. The gating mechanism ensures that the final vector generated by the encoder can reach balance between the hidden layer state and the output result of the self-attention mechanism, avoids the excessive attention of the model to certain context, and further improves the generalization capability of the model.
In the step (6), an attention mechanism is adopted in the decoding process, so that the problem of information loss caused by overlong input sequences is effectively avoided, and the performance of the encoder is further improved.
In the step (7), an additional auxiliary task module, namely a knowledge matching module, is introduced to promote that the model can focus on the fact information most relevant to the question and answer. Two kinds of distribution are introduced, the first kind is prior distribution introducing external knowledge, the second kind is posterior distribution introducing external knowledge and decoder hidden layer state information, and similarity degree between the two kinds of distribution is evaluated through JS divergence, thereby promoting that the problem generated by the system can be closer to relevant facts.
In said step (8), an additional auxiliary task module, i.e. a semantic search space matching module, is introduced to make the generated question semantically as close as possible to the input text.
In the step (9), an additional auxiliary task module, namely a query word prediction module is introduced, and the module is composed of a simple classifier and aims to predict the query words in the encoder part and promote the decoder to generate more pertinent problems.
The experimental data set is a Stanford QuestionAnswering Dataset (SQuAD) question-answer data set, and 5 experiments are supplemented to prove the effectiveness of the system, wherein the effectiveness comprises the following steps:
(1) and automatically evaluating the experiment, wherein the specific indexes are BLEU-1, BLEU-2, BLEU-3, BLEU-4, Rouge-L and Meteor.
(2) The evaluation indexes of the model self-ablation experiment are BLEU-1, BLEU-2, BLEU-3, BLEU-4, Rouge-L and Meteor.
The model in the invention is named as KBIDN (knowledge Graph Based and iterative Word drive network), and the baseline models participating in experimental comparison are Seq2Seq, Seq2Seq + Att, NQG + +, s2s-a-at-mcp-gsa, Q-drive and ASs2 s. The models participating in the ablation experiment are KBIDN, KBIDN-w/o OSGA, TBGAN-w/o LF and TBGAN-w/o GSA. TBGAN-w/o CP, TBGAN-w/o K & S, TBGAN-w/o SSSM and TBGAN-w/o IWP represent ablation submodels for removing a one-hop static map attention mechanism, removing lexical feature information, removing a gated self-attention mechanism layer, removing a copy mechanism, simultaneously removing a knowledge matching module and a semantic search space matching module, removing a semantic search space matching module and removing a query word prediction module, respectively.
(3) Interrogative predictive experiments. The baseline models involved in the experiment were NQG + + and Q-drive.
(4) And (3) researching an influence experiment of beam search on Seq2Seq decoding, wherein evaluation indexes are BLEU-1, BLEU-2, BLEU-3, BLEU-4, Rouge-L and Meteor.
(5) Sample analysis experiments, the baseline models involved in the experiments were NQG + + and ASs2 s.
TABLE 1 Performance of the model in the automated assessment experiment
Figure BDA0003393414440000161
Table 1 shows the experimental results of the proposed model and baseline model automated evaluation. The model provided by the invention is higher than the baseline model in all indexes, and the problem that the model can be generated to be closer to the ground route is fully proved. The reason may be that the model proposed by the present invention introduces a gated self-attention mechanism in the encoder part to improve the ability of the encoder to acquire context semantic information, and meanwhile, a pre-trained one-hop knowledge map-based static attention vector is spliced in the word embedding layer to further integrate external knowledge information into the hidden layer state of the encoder. In addition, the knowledge matching module further promotes the generation of the distance between the problem and the group channel in the semantic space by introducing the posterior information, so that the problem closer to the group channel can be generated. After the attention mechanism is introduced to the naive Seq2Seq model, the experimental effect is remarkably improved, and the attention mechanism can fully acquire the context semantic information in the encoder. The Q-drive model fully shows that the multi-task learning technology can improve the performance of the simple model by adding a simple auxiliary task, namely, a predicted question word, and the experimental result of the Q-drive model is higher than s2s-a-at-mcp-gsa with a complex model structure.
TABLE 2 Performance of the model in ablation experiments
Figure BDA0003393414440000162
Figure BDA0003393414440000171
Table 2 problem generation results of systematic ablation experiments. It was found that the overall performance of the model was degraded regardless of which part of the sub-modules was removed. It is worth noting that when the static attention vector based on one hop is removed or the probability distribution of copy is not calculated, the performance of the model is reduced remarkably, the scores of the model on the index BLEU-4 are reduced by 5.9% and 6.2%, respectively, and the problem that the model generation is close to the ground route can be greatly promoted by introducing an OSGA module and a copy mechanism is effectively proved. The reasons may be: 1) a large amount of external knowledge is fused on the basis of the one-hop static attention vector, and the context semantic information of the text input by the encoder is greatly enriched; 2) the copy mechanism can improve the probability of low-frequency words in the probability distribution of the generated words, the BLEU and other indexes are calculated according to the overlapping degree of the words, the higher the overlapping degree is, the higher the score is, and therefore the copy mechanism can also greatly improve the performance of the model. In addition, when the knowledge matching module and the semantic search space matching module are removed, the score of the model on the BLEU-4 index is reduced most remarkably, and the amplitude is 10.9%, because the knowledge matching module utilizes a large amount of posterior knowledge to improve the coding performance of the coder in the training stage so as to generate a problem which is more close to ground route; and when only the semantic search space matching module is removed, the scores of all indexes slightly rise, and the effectiveness of the knowledge matching module is laterally proved. Finally, the fact that the influence of the gating self-attention mechanism removal module or the query word prediction module on the model is small is found, the reduction amplitude on the BLEU-4 is 2.9% and 2.2% respectively, and the reason is probably that under the multi-task learning framework, the richness of the gating self-attention mechanism for acquiring the context semantic information is reduced due to the fact that the generalization capability of the model is improved.
TABLE 3 percentage of doubtful words in SQuAD dataset
Figure BDA0003393414440000172
TABLE 4 accuracy rate of question words predicted by the question generation System
Figure BDA0003393414440000173
Table 3 shows the percentage of each query in the SQuAD data set. Table 4 shows the accuracy of the proposed and baseline models to predict the interrogative words. The accuracy of the KBIDN model provided by the invention on most of doubtful word prediction is slightly higher than that of a Q-drive model and is far higher than that of an NQG + + model. The reason may be that the present invention treats the predictive interrogatories as a separate subtask and trains them jointly with Seq2Seq and other auxiliary tasks. During the training process, the model can continuously improve the prediction capability for generating the questioning words and promote the decoder to generate more pertinent problems. The NQG + + model does not design additional auxiliary tasks for the questioning words, so that the prediction accuracy rate of the NQG + + model on most of the questioning words is low, but it is worth noting that the NQG + + model has the highest accuracy rate on the questioning words of the type "why", because the NQG + + model considers the conversion relation among the questioning words, for example, "why" can be replaced by "for what", so that the NQG + + model has a better prediction effect on the type "why". Although the Q-drive model also adopts a similar idea to construct an independent auxiliary task to learn the relevant information of the query word, the encoder of the Q-drive model is inferior to KBIDN in the capability of acquiring the context semantic information because external knowledge is not introduced to enrich the context information, so that the Q-drive model cannot concentrate on the entity or knowledge relevant to the input text, and the prediction result is influenced.
TABLE 4 comparison of greedy and bundle searches on the TBIDN model
Figure BDA0003393414440000181
In the decoding process of Seq2Seq, a word corresponding to the maximum value of the word probability distribution at each moment is often output, and the algorithm is a greedy search algorithm. However, greedy search has a certain limitation, because the output of each step of the decoder depends on the input of the previous step, if only the word with the highest probability is selected, the model is easily trapped in the local optimal solution, and the global optimal solution cannot be guaranteed. To alleviate this problem, it is conceivable to use a beam search algorithm, the core idea of which is to generate k most probable sequences at each decoding time step, followed by k | V at each subsequent stepbsSelecting k most likely sequences from the | candidates, where k and VbsRespectively representing the size of the bundle and the vocabulary, | Vbs| represents the size of the vocabulary. As shown in table 4, 3 different values of k were selected for the experiment and compared to the greedy search. Obviously, the scores of the sequences decoded by the beam search algorithm are higher than the scores of the sequences decoded by the greedy search algorithm on each index, and the experimental results are better along with the increase of the k value, which fully shows that the beam search algorithm can enable a decoder to jump out of a local optimal solution and search the range of a solution space as much as possible so as to achieve the aim of obtaining the optimal solutionAnd (4) global optimization. On the other hand, although the model performance improves with increasing beam size k, the decoder decoding complexity increases.
TABLE 5 Experimental results of sample analysis of problem Generation System
Figure BDA0003393414440000182
Table 5 shows the experimental results of the sample analysis of the problem generation model, and the NQG + + and ASs2s models were selected as baseline models for this experiment. The answer label is "Engineering News-Record", which is the name of a magazine, so the question is asked using the "what" beginning. Both ASs2s and KBIDN generate a problem beginning with "what" and the semantics are close to ground route, proving the validity of the two models for simple input sentences. The problem generated by the KBIDN also comprises an important entity "index", and semantic information of the generated problem is further enriched. On the other hand, the NQG + + generated question begins with "in which", stating that it does not capture the true information of the answer label. It is noted that the answer label "engineering news-Record" is included in each of the three model-generated questions, further demonstrating the importance and effectiveness of the copy mechanism in the question generation system.
Parts of the invention not described in detail are well known in the art. The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the scope of the specific embodiments, and it is obvious to those skilled in the art that various changes are made within the spirit and scope of the present invention defined and determined by the claims, and all the inventions utilizing the inventive concept are protected.

Claims (11)

1. A knowledge-graph and question-word-driven question generation system, comprising:
the text preprocessing module is used for preprocessing the text;
the one-hop knowledge map construction module is used for constructing a one-hop knowledge map based on the preprocessed text;
the attention vector calculation module is used for calculating a static map attention vector based on a one-hop knowledge map;
the characteristic-enhanced encoder is characterized in that a one-hop knowledge map is converted into a one-dimensional static word embedding vector, and then the one-dimensional static word embedding vector is spliced with a word embedding vector of an input text, an answer position information vector of the input text and a vocabulary characteristic information vector to serve as net input of the encoder so as to enhance the capability of the encoder for acquiring context semantic information;
the gated attention mechanism module is used for storing additional context semantic information of the encoder part and further expanding the context semantic of the input text;
the decoder is used for decoding the intermediate state one-dimensional vector coded by the coder so as to output final word probability distribution;
a knowledge matching module; to generate semantically more relevant questions;
the semantic search space matching module is used for calculating the semantic similarity of the question and the answer;
and the query word prediction module is used for predicting the query words corresponding to the input text.
2. A problem generation method based on knowledge graph and question word drive is characterized by comprising the following steps:
step (1), preprocessing a text;
step (2), constructing a one-hop knowledge graph based on the preprocessed text;
step (3), calculating a static graph attention vector based on a one-hop knowledge graph;
step (4), constructing a characteristic-enhanced encoder, namely firstly converting a one-hop knowledge map into a one-dimensional static word embedding vector, and then splicing the one-dimensional static word embedding vector with a word embedding vector of an input text, an answer position information vector of the input text and a vocabulary characteristic information vector to be used as net input of the encoder so as to enhance the capability of the encoder for acquiring context semantic information;
constructing a gating self-attention mechanism module to store additional context semantic information of the encoder part and further expand the context semantic of the input text;
step (6), constructing a decoder, which is used for decoding the intermediate state one-dimensional vector coded by the coder so as to output the final word probability distribution;
step (7), constructing a knowledge matching module; to generate semantically more relevant questions;
step (8), constructing a semantic search space matching module to calculate the semantic similarity of the question and the answer;
and (9) constructing a query word prediction module to predict the query words corresponding to the input text.
3. The method for generating question based on knowledge-graph and question word driving according to claim 2, characterized in that the step (1) of preprocessing the text comprises the following specific steps:
processing the text formats in a unified way, namely processing all texts, deleting redundant spaces before, after and in the middle of the texts, and removing non-English letter symbols; encoding each Word into a multi-dimensional Word embedding form by adopting Global Vectors for Word retrieval (GloVe) encoding, wherein the vocabulary size of the GloVe is selected as NGThe unknown word is expressed as<UNK>。
4. The problem generation method based on knowledge graph and query term driving according to claim 2, wherein the step (2) of constructing a one-hop knowledge graph comprises the following specific steps:
selecting a ConceptNet large-scale common sense map as a knowledge base, searching a one-hop node in the common sense map for each word aiming at the input text, fixing the number of the nodes to be 60, adopting a bottom-bound triple NOT _ A _ FACT to represent the triple which is NOT matched with any entity, obtaining the one-hop knowledge map formed by the triples, and simultaneously keeping a copy of the one-hop knowledge map.
5. The method of claim 2, wherein the step (3) of calculating a one-hop-based histogram-based static graph attention vector, matching each word in the input sentence to its corresponding multi-hop graph, and converting the multi-hop graph into a corresponding static graph attention vector for input into the encoder structure, let K ═ { K ═ K { (K) } K1,...,k|K|The triple vector of the knowledge map is represented as ki=(hi,ri,ti) Where, | K | represents the number of triples in the set K, hi、riAnd tiAre respectively triplets kiCorresponding head, relation and tail vectors, i ∈ [1, | K](ii) a In order to obtain the graph embedding vector corresponding to the one-hop knowledge graph, a one-hop knowledge graph set is obtained firstly
Figure FDA0003393414430000021
Wherein x is an input sequence; | x | is the length of the input sequence; one represents the relevant symbol of the one-hop knowledge-graph,
Figure FDA0003393414430000022
one-hop knowledge graph at time t
Figure FDA0003393414430000023
The corresponding three tuple sets are
Figure FDA0003393414430000024
Wherein the content of the first and second substances,
Figure FDA0003393414430000025
is a set
Figure FDA0003393414430000026
The number of elements contained in the steel;
Figure FDA0003393414430000027
is provided with
Figure FDA0003393414430000028
Figure FDA0003393414430000029
And
Figure FDA00033934144300000210
j-th triplet at respective time t
Figure FDA00033934144300000211
The included head, relationship and tail vectors; calculating the final one-jump static map attention vector g at the time t by the following formulat
Figure FDA00033934144300000212
Figure FDA00033934144300000213
Wherein, gtRepresenting the attention vector of the one-jump static graph corresponding to the input at the time t; alpha is alphatiThe attention scores of the 0-hop entity and the ith one-hop entity input at the time t; exp (·) is an exponential function with a natural constant e as base; τ (-) is a bilinear attention function; [;]is the concatenation symbol of the vector.
6. The method for generating question based on knowledge-graph and question-word driving as claimed in claim 2, wherein the step (4) of constructing feature-enhanced encoder, the encoder uses a bidirectional Long Short-Term Memory (LSTM) for encoding, and the calculation is as follows:
Figure FDA0003393414430000031
Figure FDA0003393414430000032
Figure FDA0003393414430000033
wherein the content of the first and second substances,
Figure FDA0003393414430000034
and
Figure FDA0003393414430000035
hidden layer vectors respectively representing positive direction decoding and reverse direction decoding of the bidirectional LSTM at the time t; enc is the mark of the encoder, the same as below;
Figure FDA0003393414430000036
the input vector after splicing; x is the number oft、gt、mtAnd ltRespectively representing a word embedding vector at the time t, a corresponding one-hop static graph attention vector, an answer position information vector and a vocabulary characteristic information vector.
7. The knowledge-graph-and-query-word-driven question generation method according to claim 2, wherein said step (5) of constructing a gated self-attention mechanism module first obtains a coder hidden layer vector matrix
Figure FDA0003393414430000037
Wherein the content of the first and second substances,
Figure FDA0003393414430000038
Figure FDA0003393414430000039
representing a two-dimensional set of real number fieldsThe dimension is | x | × 2 d; d is the dimension of the LSTM hidden layer state; | x | is the length of the input sequence x;
Figure FDA00033934144300000310
is provided with
Figure FDA00033934144300000311
[;]Is a vector concatenator; then, a self-attention moment array is obtained by a self-attention algorithm
Figure FDA00033934144300000312
Finally, a gate control unit controls the finally generated self-attention moment array
Figure FDA00033934144300000313
Figure FDA00033934144300000314
Figure FDA00033934144300000315
Figure FDA00033934144300000316
Figure FDA00033934144300000317
Wherein the content of the first and second substances,
Figure FDA00033934144300000318
and
Figure FDA00033934144300000319
respectively representing matrices, fusions, derived from attention algorithmsThe original matrix and the matrix obtained after the attention algorithm and the matrix finally obtained by the gating mechanism, and
Figure FDA00033934144300000320
| x | is the length of the input sequence x; q, K and V are both state parameter matrices calculated from the attention scores; softmax (-), tanh (-), MLP (-), and σ (-), represent the Softmax function, tanh function, the multi-layer perceptron function, and the Sigmoid function, respectively;
Figure FDA00033934144300000321
a connection symbol representing a matrix; an indicator indicating a multiplication of a corresponding position of the matrix; j is a full 1 matrix, and the matrix dimension is consistent with H.
8. A knowledge-graph-and-query-word-driven question generation method according to claim 2, characterized in that said step (6) of constructing a decoder part, the decoder part being constituted by another LSTM, as follows:
Figure FDA0003393414430000041
Figure FDA0003393414430000042
Figure FDA0003393414430000043
wherein the content of the first and second substances,
Figure FDA0003393414430000044
is the hidden layer vector of the decoder at the moment t; dec is a correlation symbol representing the decoder; y ist-1Represents the output vector of the decoder at time t-1;
Figure FDA0003393414430000045
an attention vector representing time t; beta is atjRepresenting the attention score between the hidden layer vector at the moment t of the decoder and the jth input sequence of the encoder;
Figure FDA0003393414430000046
a gated self-attention vector for encoder time j; softmax (-) and τ (-) are the Softmax function and the bilinear attention function, respectively; [;]is a vector connector;
a copy mechanism is introduced in the decoding stage to avoid that the decoder ignores some important low frequency words, i.e. the decoder generates two distributions:
Figure FDA0003393414430000047
Figure FDA0003393414430000048
Figure FDA0003393414430000049
P(ot)=(1-μt)Pv(ot=wv)+μtPc(ot=wc)
wherein, mutA threshold value between 0 and 1; sigma (·), MLP (·) and copy (·) are respectively a Sigmoid function, a multilayer perceptron function and a copy mechanism function; otIs the output value at the time t; w is avAnd wcRespectively representing words generated from a vocabulary and generated according to a copy mechanism; subscripts v and c represent bar identifiers generated from the vocabulary and according to the copy mechanism, respectively; [;]is a vector connector;
the loss function of the decoder is represented by:
Figure FDA00033934144300000410
wherein L isqIs a loss function of the decoder; log (-) is a logarithmic function with a natural constant e as a true number; y istIs the output value of the decoder at the moment t; | y | is the length of the decoder output sequence; x is an input sequence; Γ is the set of knowledge-graph fact triple vectors.
9. The knowledge-graph-and-query-driven question generation method according to claim 2, characterized in that said step (7) of constructing a knowledge matching module to generate semantically more relevant questions; given set of knowledge-graph fact triplets vectors
Figure FDA00033934144300000411
And is
Figure FDA00033934144300000412
Has fi=(hi,ri,ti) Wherein, | Γ | is the number of elements in the triple set Γ; h isi、riAnd tiAre triplets f respectivelyiCorresponding head, relation and tail vectors; h isiAnd tiHas a dimension of de,riHas a dimension of drThus, the set Γ may constitute a knowledge matrix
Figure FDA0003393414430000051
In the coder-decoder structure, the hidden layer state of the coder is regarded as prior information, the hidden layer state of the decoder is regarded as posterior knowledge, and the knowledge matching module calculates the distribution of two kinds of fused external information, namely prior distribution ZetapriorAnd posterior distribution ζpostIf the dimensions of the two distributions are | Γ |, the ith dimensions of the two distributions both represent the attention degree of the model to the ith fact triple;
Figure FDA0003393414430000052
Figure FDA0003393414430000053
wherein, Softmax (-) and tanh (-) are Softmax function and tanh function respectively; wF、WpriorAnd WpostAre all parameter matrices; [;]is a vector connector; an indicator indicating a multiplication of a corresponding position of the matrix; | x | is the length of the input sequence x; | y | is the length of the decoder output sequence y;
Figure FDA0003393414430000054
attention vector for decoder y time instant;
and finally, taking Jensen-Shannon Divergence (JS Divergence) of the prior distribution and the posterior distribution as a loss function of the knowledge matching module:
Lk=JS(ζprior||ζpost)
wherein L iskA loss function for the knowledge matching module; JS (| |) is the JS divergence function.
10. The method for generating question based on knowledge-graph and question-word driving according to claim 2, wherein said step (8) is to construct a semantic search space matching module to calculate semantic similarity between question and answer, and the parameter matrix W obtained in step (7) isFAnd WpostContains a posteriori information, i.e., semantic information related to the question; designing two mapping functions;
Figure FDA0003393414430000055
and ψ (-) to project the question vector and answer vector to the corresponding semantic search spaces, respectively:
Figure FDA0003393414430000056
Figure FDA0003393414430000057
Figure FDA0003393414430000058
wherein e isaveThe average value of all word embedding vector splicing vectors of the word embedding layer is obtained; x is the number oft、gt、mtAnd ltRespectively representing a word embedding vector at the time t, a corresponding one-hop static graph attention vector, an answer position information vector and a vocabulary characteristic information vector; ws、WFAnd WpostAre all parameter matrices;
Figure FDA0003393414430000059
and
Figure FDA00033934144300000510
the space vectors are searched for the semantics corresponding to the question and the answer, respectively, and in addition, there are:
Figure FDA0003393414430000061
Figure FDA0003393414430000062
wherein the content of the first and second substances,
Figure FDA0003393414430000063
and ψ (·) is a mapping function for the question and answer sequences, respectively;
Figure FDA0003393414430000064
is L-2 norm;
finally, calculation was done by Kullback-Leibler Divergence (KL Divergence)
Figure FDA0003393414430000065
And
Figure FDA0003393414430000066
distance in the projected semantic search space and as a loss function of this module:
Figure FDA0003393414430000067
wherein L issLoss function of this module, DKL(| |) is the KL divergence function.
11. The question generating method based on knowledge graph and query term driving according to claim 2, characterized in that, the step (9) of constructing a query term prediction module to predict the query terms corresponding to the input text, wherein the query term prediction module is composed of a classifier, the input of the classifier is an answer label sequence, and a TextCNN model is used as the classifier of the query terms; given answer label sequence
Figure FDA0003393414430000068
Wherein x isaA starting word representing a sequence of answer labels; | xansIf | represents the length of the answer label sequence, the output probability distribution of the predicted query word is calculated as follows:
PTextCNN(ut=sk)=Softmax(TextCNN(xans))
wherein u istAs output of the classifier, skIs the label value in the classified label set; TextCNN (·) is a TextCNN network function;
finally, the cross entropy is used as the loss function of the module:
Figure FDA0003393414430000069
wherein Lr is a loss function of the module; log (-) is a logarithmic function with a natural constant e as a true number;
Figure FDA00033934144300000610
is the answer xansA corresponding real query word label;
the final loss function is calculated by:
L=λq·Lqk·Lks·Lsr·Lr
wherein L is the final loss function; lambda [ alpha ]q、λk、λsAnd λrAre respectively Lq、Lk、LsAnd LrThe weight coefficient of (2).
CN202111475261.5A 2021-12-06 2021-12-06 Question generation system based on knowledge graph and question word drive Pending CN114168749A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111475261.5A CN114168749A (en) 2021-12-06 2021-12-06 Question generation system based on knowledge graph and question word drive

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111475261.5A CN114168749A (en) 2021-12-06 2021-12-06 Question generation system based on knowledge graph and question word drive

Publications (1)

Publication Number Publication Date
CN114168749A true CN114168749A (en) 2022-03-11

Family

ID=80483493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111475261.5A Pending CN114168749A (en) 2021-12-06 2021-12-06 Question generation system based on knowledge graph and question word drive

Country Status (1)

Country Link
CN (1) CN114168749A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114547273A (en) * 2022-03-18 2022-05-27 科大讯飞(苏州)科技有限公司 Question answering method and related device, electronic equipment and storage medium
CN114780749A (en) * 2022-05-05 2022-07-22 国网江苏省电力有限公司营销服务中心 Electric power entity chain finger method based on graph attention machine mechanism
CN114936296A (en) * 2022-07-25 2022-08-23 达而观数据(成都)有限公司 Indexing method, system and computer equipment for super-large-scale knowledge map storage
CN115062587A (en) * 2022-06-02 2022-09-16 北京航空航天大学 Knowledge graph embedding and reply generation method based on surrounding information
CN115357705A (en) * 2022-10-24 2022-11-18 成都晓多科技有限公司 Method, device and equipment for generating entity attribute in question text and storage medium
CN116681087A (en) * 2023-07-25 2023-09-01 云南师范大学 Automatic problem generation method based on multi-stage time sequence and semantic information enhancement
CN117610513A (en) * 2024-01-22 2024-02-27 南开大学 Knowledge protection and selection-based theme text generation method
CN115062587B (en) * 2022-06-02 2024-05-31 北京航空航天大学 Knowledge graph embedding and replying generation method based on surrounding information

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114547273A (en) * 2022-03-18 2022-05-27 科大讯飞(苏州)科技有限公司 Question answering method and related device, electronic equipment and storage medium
CN114547273B (en) * 2022-03-18 2022-08-16 科大讯飞(苏州)科技有限公司 Question answering method and related device, electronic equipment and storage medium
CN114780749A (en) * 2022-05-05 2022-07-22 国网江苏省电力有限公司营销服务中心 Electric power entity chain finger method based on graph attention machine mechanism
CN115062587A (en) * 2022-06-02 2022-09-16 北京航空航天大学 Knowledge graph embedding and reply generation method based on surrounding information
CN115062587B (en) * 2022-06-02 2024-05-31 北京航空航天大学 Knowledge graph embedding and replying generation method based on surrounding information
CN114936296A (en) * 2022-07-25 2022-08-23 达而观数据(成都)有限公司 Indexing method, system and computer equipment for super-large-scale knowledge map storage
CN115357705A (en) * 2022-10-24 2022-11-18 成都晓多科技有限公司 Method, device and equipment for generating entity attribute in question text and storage medium
CN116681087A (en) * 2023-07-25 2023-09-01 云南师范大学 Automatic problem generation method based on multi-stage time sequence and semantic information enhancement
CN116681087B (en) * 2023-07-25 2023-10-10 云南师范大学 Automatic problem generation method based on multi-stage time sequence and semantic information enhancement
CN117610513A (en) * 2024-01-22 2024-02-27 南开大学 Knowledge protection and selection-based theme text generation method
CN117610513B (en) * 2024-01-22 2024-04-02 南开大学 Knowledge protection and selection-based theme text generation method

Similar Documents

Publication Publication Date Title
CN108734276B (en) Simulated learning dialogue generation method based on confrontation generation network
CN114168749A (en) Question generation system based on knowledge graph and question word drive
CN108763510B (en) Intention recognition method, device, equipment and storage medium
Tan et al. Lstm-based deep learning models for non-factoid answer selection
CN111259127B (en) Long text answer selection method based on transfer learning sentence vector
US20180329884A1 (en) Neural contextual conversation learning
CN108780464A (en) Method and system for handling input inquiry
CN109992669B (en) Keyword question-answering method based on language model and reinforcement learning
EP3913521A1 (en) Method and apparatus for creating dialogue, electronic device and storage medium
CN111897944B (en) Knowledge graph question-answering system based on semantic space sharing
CN114428850B (en) Text retrieval matching method and system
US20230395075A1 (en) Human-machine dialogue system and method
CN113297364A (en) Natural language understanding method and device for dialog system
CN116127095A (en) Question-answering method combining sequence model and knowledge graph
CN115495552A (en) Multi-round dialogue reply generation method based on two-channel semantic enhancement and terminal equipment
CN115688879A (en) Intelligent customer service voice processing system and method based on knowledge graph
CN113971394A (en) Text repeat rewriting system
CN115186147B (en) Dialogue content generation method and device, storage medium and terminal
Kasai et al. End-to-end graph-based TAG parsing with neural networks
CN114429143A (en) Cross-language attribute level emotion classification method based on enhanced distillation
Liu Neural question generation based on Seq2Seq
CN116010553A (en) Viewpoint retrieval system based on two-way coding and accurate matching signals
CN115964459B (en) Multi-hop reasoning question-answering method and system based on food safety cognition spectrum
CN114328866A (en) Strong anthropomorphic intelligent dialogue robot with smooth and accurate response
CN112463935B (en) Open domain dialogue generation method and system with generalized knowledge selection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination