CN111813913A - Two-stage problem generation system with problem as guide - Google Patents

Two-stage problem generation system with problem as guide Download PDF

Info

Publication number
CN111813913A
CN111813913A CN202010661187.5A CN202010661187A CN111813913A CN 111813913 A CN111813913 A CN 111813913A CN 202010661187 A CN202010661187 A CN 202010661187A CN 111813913 A CN111813913 A CN 111813913A
Authority
CN
China
Prior art keywords
context
sequence
answer
encoder
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010661187.5A
Other languages
Chinese (zh)
Other versions
CN111813913B (en
Inventor
沈耀
倪茂森
过敏意
姚斌
陈�全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202010661187.5A priority Critical patent/CN111813913B/en
Publication of CN111813913A publication Critical patent/CN111813913A/en
Application granted granted Critical
Publication of CN111813913B publication Critical patent/CN111813913B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computing Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Administration (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Educational Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A problem-oriented two-stage problem generation system, comprising: the system comprises a question-answer data preprocessing module, a context sequence labeling module and a question generating module, wherein: the question-answer data preprocessing module performs re-division, feature extraction and dictionary construction on the data set and vectorizes features and words to obtain a labeling training set and a real label; the context sequence marking module adopts the marked data set to train a network model and obtain a prediction label of a context; and the problem generation module generates a prediction problem sequence by taking the real label and the prediction label as input, and performs back propagation training to obtain a final maximum probability prediction problem through an error of the real label and the prediction label. The invention has obvious improvement on BLEU, MENTOR and ROUGE-L indexes.

Description

Two-stage problem generation system with problem as guide
The application is as follows: application No. 201911179784.8 filed on filing date 2019/11/27 entitled "two stage problem creation System with problem oriented").
Technical Field
The invention relates to a technology in the field of natural language processing, in particular to a problem-oriented two-stage problem generation system.
Background
Question Generation (QG), which aims to generate questions from various natural language texts, plays a crucial role in natural language generation. In recent years, the problem generation has attracted increasing attention due to its wide application. The most intuitive application is to expand the dataset of the question-and-answer task, thereby improving the performance of the task. The questions can help readers to evaluate the degree of mastery of the context and remind the readers of possible omission in the reading process, and the questions have important significance for relieving the burden of teachers and improving the teaching quality in the education industry. Furthermore, in conversational systems, smooth communication often relies on rational questions, which have been an important component of existing conversational systems (e.g. Siri, Alexa and Cortana).
The question generation task, which is a symmetric task of the question answering task, presents valid questions, mainly given context and answers. Many existing end-to-end networks work well in the problem generation area, but there are two problems: 1) questions that do not fully utilize the dataset, just for tags to compute the loss 2) do not make efficient use of the answer, only to fuse the answer into context using 01 or BIO tags.
Disclosure of Invention
The invention provides a two-stage problem generation system taking problems as guidance aiming at the defects of insufficient problem utilization and insufficient answer attention in the prior art,
the invention is realized by the following technical scheme:
the invention relates to a problem-oriented two-stage problem generation system, comprising: the system comprises a question-answer data preprocessing module, a context sequence labeling module and a question generating module, wherein: the question-answer data preprocessing module performs re-division, feature extraction and dictionary construction on the data set and vectorizes features and words to obtain a labeling training set and a real label; the context sequence marking module adopts the marked data set to train a network model and obtain a prediction label of a context; and the problem generation module generates a prediction problem sequence by taking the real label and the prediction label as input, and performs back propagation training to obtain a final maximum probability prediction problem through an error of the real label and the prediction label.
The invention relates to a problem-oriented two-stage problem generation method of the system, which comprises the following stages:
the first stage, based on the LSTM-CRF network, of which the inputs are a separate context encoder and answer encoder, notes in context the words that may be included in the question, where: the context encoder outputs the attribute to the output of the answer encoder to fuse answer information to obtain a fusion matrix H, and finally a sequence mark of a context is generated through a feedforward structure.
The second stage, vectorizing the sequence mark generated in the first stage, connecting the sequence mark with a fusion matrix H as the input of an encoder, and generating the output of the encoder by using a self-attention mechanism with a gate structure in order to promote the information fusion of the context of the long text; in the decoding process, the output of the encoder and the output of the answer encoder in the first stage are simultaneously paid attention to, and a copying mechanism is used for copying the words in the context, so that the problem-oriented two-stage problem is finally obtained.
Technical effects
Compared with the prior art, the relevance of the generated problems and the answers is higher, and meanwhile word overlap evaluation indexes (BLEU, MENTOR, METEOR-L) reach the highest of the existing models; the answer information is effectively decoupled by adopting an answer coding separation mode, and the low-level semantic information is paid attention again in the decoding process, so that the answer information loss is reduced; the two-stage mode of context sequence marking and question generation is used for realizing marking of context words and then generating of out-of-context words and organizing of semantics, and generation of question sentences is completed more efficiently.
The encoder adopts a mode of separating context from answers, so that information fusion of a network structure is more convenient, and the attention degree of question sentences to the answers is improved; and the invention adopts a two-stage mode, firstly, whether the word sequence in the context appears in the question is marked in the first stage to obtain the additional vectorization characteristic, then, the encoder uses the self-attribute structure with the gate structure in the second stage to strengthen the fusion of the context information of the long text, and not only pays attention to the output of the encoder in the decoder stage, but also pays attention to the output of the answer encoder, and finally the BLEU, METEOR and ROUGE _ L indexes of the generated problems reach the best effect at present.
Drawings
FIG. 1 is a block diagram of the system of the present invention;
FIG. 2 is a schematic diagram of the LSTM-CRF network of the present invention;
FIG. 3 is a schematic diagram of the problem module generation of the present invention.
Detailed Description
As shown in fig. 1, the present embodiment relates to a two-stage problem generation system based on an end-to-end network, which includes: the system comprises a question-answer data preprocessing module, a context sequence labeling module and a question generating module, wherein: the question-answer data preprocessing module performs re-division, feature extraction and dictionary construction on the data set and vectorizes features and words to obtain a labeling training set and a real label; the context sequence marking module adopts the marked data set to train a network model and obtain a prediction label of a context; and the problem generation module generates a prediction problem sequence by taking the real label and the prediction label as input, and performs back propagation training to obtain a final maximum probability prediction problem through an error of the real label and the prediction label.
The experimental data for this example is from the Stanford question and answer source data set SQuAD data set that collected over 100,000 + question and answer pairs in a crowd-sourced manner. The dataset used a wikipedia internal ordering system to obtain the top ten thousand high quality articles, then randomly draw 536 articles in the ten thousand, then extract paragraphs from these drawn articles, delete images and tables, discard less than 500 characters of paragraphs, and then get 23215 paragraphs. Next, a crowd-sourced approach is used to ask questions about the paragraphs, where the answer is a segment of the context.
The test environment of this embodiment is a single NVIDIA Titan RTX, the version of the deep learning framework adopted is Pytorch 1.1.0, and the version of CUDA is 10.0.130.
The division specifically includes: the training set for the SQuAD dataset is about 87,000 +, the validation set is 10,000 +, and the test set is also 10,000 +, but is not disclosed, so the validation set is divided in half, half for the validation set and half for the test set.
The feature extraction and dictionary construction specifically comprises the following steps: according to the fact that the used pre-training word vector is Glove, firstly, all words in a training set obtained through division are counted, the word frequency is higher than a word frequency threshold value and the words contained in the Glove are used as a set, and then, words are additionally added<UNK>、<PAD>、<S>、</S>The unknown word, the filling word, the initial character and the end character are used as the dictionary of the embodiment, and then the context sequence, the question sequence and the answer sequence are converted into subscripts in the dictionary set which are respectively marked as Wc、WqAnd WansAt the same time, the space toolkit is used to perform named body recognition (NER) on the context sequenceThe sequence of (A) to (B) is denoted as WnerThe sequence obtained by Part of Speech tagging (POS) is marked as WposFinally, the prototype or entry W of the non-stop word appearing in the question in the context is labeledemerge
The word frequency threshold value in this embodiment is 4, which can be customized in different situations.
The characteristic and word vectorization specifically includes: converting the context sequence and the answer sequence obtained after the construction of the dictionary and the characteristics into a subscript W in the dictionary setcAnd WansUsing Glove pre-training word vector to carry out vectorization, and identifying a sequence W obtained by a named bodynerSequence W obtained by word taggingposRandom vectorization is performed to mark the prototypes or words W of non-stop words appearing in the question in the contextemergeAnd the problem sequence is converted into a subscript W in the dictionary setqAnd the real labels are respectively used as a context sequence labeling module and a question generating module.
The named body type and the part of speech category are respectively 8 and 12, and the dimension is respectively 8 and 12 because the dimension depends on the number of the categories contained in the feature.
As shown in fig. 1 and fig. 2, the context sequence labeling module includes: a separate dual input encoder, a feed-forward network structure, a Conditional Random Field (CRF) structure.
The split dual-input encoder comprises: context encoder and answer encoder, which respectively pass context and answer sequence through the two different two-layer bidirectional LSTM encoders to respectively obtain two vectors Sc,SaAs context state vector and answer state vector:
Figure RE-GDA0002632662410000031
Figure RE-GDA0002632662410000032
wherein: giPre-training word vectors for Glove; f. ofiFor additional feature information, in a context encoder fiContaining the named body and part of speech information in the answerOnly the pre-training word vector, f, is considered in the encoderiIs empty; the arrow symbol indicates the direction of the recurrent neural network; [;]representing the joining of the two vector final dimensions.
In order for the context encoder to sense the current answer information, the embodiment uses an attention mechanism to fuse the answer information into the context to obtain a fusion matrix H, where the general expression of the attention mechanism is as follows: attention (Q, K, V) ═ softmax ((W)Q·Q)·(WK·K)T)·(WVV), wherein: h ═ Attention (S)a,Sc,Sc) Wherein: wQ,WK
Figure RE-GDA0002632662410000048
Is a trainable parameter matrix.
The feedforward network structure is a fully-connected network, and comprises: two linear transformations, one ReLU activation function, residual structure and Layer normalization (Layer normalization), specifically: ffn (h) ═ LayerNorm (Relu (hW)1+ b1)W2+b2+h)W3Wherein: two one-dimensional convolutions are used for realizing linear transformation, the input and output of the convolutions are all 600 dimensions, the middle dimension is 2400 dimensions, and the dimensions of the parameter matrix are respectively
Figure RE-GDA0002632662410000041
Figure RE-GDA0002632662410000042
The probability of each tag of the context sequence is finally obtained.
The CRF structure is used for acquiring transition probability among labels, and the loss function is a negative log-likelihood function
Figure RE-GDA0002632662410000043
Figure RE-GDA0002632662410000044
Wherein: x denotes the input sequence, y denotes the true sequence tag, y*Representing the order of prediction possibilitiesColumn labels, Score (x, y) indicates the scores of the true annotated sequences, and the scores representing all possible predicted sequences are summed, using a calculation that uses the path scores for each step to sum, thereby significantly reducing the amount of computation.
As shown in fig. 3, the question generation module repeatedly utilizes the context matrix H fused with the answers in stage 1 to reduce the complexity of model calculation, and meanwhile, vectorizes the tag value of the output of stage 1 to serve as an additional feature E, E connects H to serve as the input of the encoder, and uses a self-attention mechanism with a gate structure to abstract the feature in a higher dimension, in the decoding process, this embodiment not only pays attention to the output of the encoder, but also pays attention to the output of the answer encoder in the context sequence labeling module, and uses a pointer network to solve the OOV problem, and finally obtains a high-quality question with strong answer correlation, and the question generation module includes: an encoder of a self-attention mechanism, a decoder focusing on the answer, and a pointer network having a gate structure.
The encoder of the self-attention mechanism is a bidirectional LSTM network, and a state vector S is obtained through the LSTM network as an input, because of a gradient disappearance problem caused by LSTM long-range dependence, the present embodiment uses a self-attention mechanism with a gate structure, specifically: obtaining an Attention intermediate state vector N by using an Attention mechanism, then screening information of S and N by using a gate structure, and using a residual structure to facilitate gradient transmission and avoid information loss to obtain a final state M, N being Attention (S, S, S) and M being sigmoid (W)G·[S;N]⊙tanh(WE·[S;N]) + S, wherein: (S) the first step of the method,
Figure RE-GDA0002632662410000045
WG
Figure RE-GDA0002632662410000046
a trainable parameter matrix, an indicates multiplication of the corresponding element of the matrix.
The decoder comprises a two-layer unidirectional LSTM structure, and the encoder is concerned in the decoding processThe output final state M is concerned and the state vector S output by the answer encoder in the dual-input encoder of the context sequence labeling moduleaTherefore, the loss caused by high-level abstract information is made up, and the method specifically comprises the following steps: contextual attention vector
Figure RE-GDA0002632662410000047
Figure RE-GDA0002632662410000051
Answer attention vector
Figure RE-GDA0002632662410000052
LSTM state transition equation
Figure RE-GDA0002632662410000053
Wherein:
Figure RE-GDA0002632662410000054
the Glove pre-training word vector for the previous word of the current predicted sequence,
Figure RE-GDA0002632662410000055
output h for the sequence of the previous wordt-1The resulting fused vector after attention with encoder output M,
Figure RE-GDA0002632662410000056
output h for the sequence of the previous wordt-1And answer state vector SaAnd obtaining an attention vector containing answer information after performing attention.
The pointer network firstly uses ht
Figure RE-GDA0002632662410000057
Connecting as input to the linear layer, the probabilities P of all words in the dictionary are outputgen. Then directly using h in the decodertWeight on M attention as probability P of context word replicationcopy(ii) a Then by regulating PcopyAnd PgenRatio of (A) to (B)By means of a gate structure to obtain a specific output Pfinal=GcopyPcopy+ (1-Gcopy)PgenWherein:
Figure RE-GDA0002632662410000058
Gcopyprobability of a gate structure.
During training, firstly, the linear layer and the LSTM of the orthogonalization parameter initialization model are adopted in the embodiment, the first two thousand pre-training word vectors are finely adjusted in the training process to fix the rest word vectors, an SGD optimizer with 0.8 momentum is adopted for training, the initialized learning rate is 0.1, then, half reduction is carried out once every 4 epochs after 8 epochs, and the best training effect is achieved at the 40 th epoch.
During reasoning, the embodiment uses a beam search with the size of 10 and an optimal model in the training process.
By using the model and the training and reasoning method, experiments are performed on the SQuAD data set, and compared with some advanced models at present, the model of the embodiment is obviously improved in BLEU, MENTOR and ROUGE-L indexes, and the experimental results are as follows:
watch 1
Model BLEU_1 BLEU_2 BLEU_3 BLEU_4 MENTOR ROUGE-L
Du et al.(2017) - - - 12.28 16.62 39.75
Song et al.(2018) - - - 13.98 18.77 42.72
Zhao et al.(2018) 45.69 30.25 22.16 16.85 20.62 44.99
Kim et al.(2019) - - - 16.20 19.92 43.96
Liu et al.(2019) 46.58 30.90 22.82 17.55 21.24 44.53
This embodiment (binary identifier) 44.68 29.10 21.12 15.93 20.13 44.26
This embodiment (answer encoder) 45.45 29.98 21.91 16.66 20.46 44.94
This example (two-stage) 46.96 31.68 23.67 18.36 21.43 45.99
The result shows that compared with the previous model, the model of the embodiment achieves the current optimal effect, and compared with the other models, the model achieves the optimal effect on each index.
Which of the above components is original to the invention, has never been disclosed and does not operate in the same manner as any of the prior literature references: a separate answer encoding structure is used and an attention mechanism is used for the answer state vector in the decoding stage; a two-stage approach is used, namely, a method of generating additional features by performing sequence labeling on the context first and then performing problem generation.
The improvement points of the answer coding structure are specifically as follows: the traditional answer labeling mode uses a binary identifier to indicate the position of the answer in the context, and effectively improves various expressions generated by the question by integrating the position information of the answer. However, the binary identifier intelligence used by it holds limited information, motivating embodiments to find better alternatives, fusing answer information more subtly into the training network. The answer encoder derives a fusion matrix H, potentially containing answer position information, by encoding the answers and focusing on the context state vector. Unlike the structure of S2S in which the answer is separated, the embodiment adds a Self-annotation Layer after the Attention Layer to extract the answer information at a high level. In addition, the state vector of the answer encoder is also noted in the decoding process to obtain the answer information of the low level. It is clear that the more fully utilized the answer state vector by the embodiment, the higher the correlation between the generated question and the correct answer.
The improvement points of the two-stage mode are specifically as follows: the traditional end-to-end architecture employs a way to regenerate the context coding into problems, which is not conducive to adding additional information. Embodiments use a two-stage approach to annotating words that may be in question in the context and encode this information, reusing the traditional end-to-end structure to generate words and organizational grammar structures.
Through specific practical experiments, under the environment settings of a single NVIDIA Titan RTX, Pythrch 1.1.0 and CUDA 10.0.130, the first two thousand pre-training word vectors are finely adjusted in the training process to fix the rest word vectors, an SGD optimizer with 0.8 momentum is adopted for training, the initialized learning rate is 0.1, then half reduction is performed every 4 epochs after 8 epochs, the optimal model is obtained at the 40 th epoch, the optimal model in the training process is used during reasoning, a beam search reasoning strategy with the size of 10 is adopted, and the obtained experimental data are shown in the table I.
Compared with the prior art, the system uses an ablation test to determine the contribution of the existing improvement point to the overall model, and the first table shows that compared with the traditional binary identifiers BLEU-4, MERTOR and ROUGE-L, the structure of the separated answer encoder is respectively improved by 0.7, 0.3 and 0.8, the context sequence marking module is additionally provided with 1.7, 1.0 and 1.0 point on the basis, and the overall effect of the model is improved by 2.4, 1.3 and 1.8 points. All indexes are improved more remarkably in a two-stage mode, and the context sequence labeling module contributes most to the overall model.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (11)

1. A two-stage problem generation system based on an end-to-end network, comprising: the system comprises a question-answer data preprocessing module, a context sequence labeling module and a question generating module, wherein: the question-answer data preprocessing module performs re-division, feature extraction and dictionary construction on the data set and vectorizes features and words to obtain a labeling training set and a real label; the context sequence marking module adopts the marked data set to train a network model and obtain a prediction label of a context; the problem generation module takes the real label and the prediction label as input to generate a prediction problem sequence, and performs back propagation training through the error of the real label and the prediction label to obtain a final maximum probability prediction problem;
the context sequence labeling module comprises: a separate dual input encoder, feedforward network structure, Conditional Random Field (CRF) structure;
the question generation module comprises: an encoder of a self-attention mechanism, a decoder focusing on the answer, and a pointer network having a gate structure.
2. The system of claim 1, wherein the repartitioning is: the question and answer data preprocessing module receives the SQuAD data set as input, and takes one half of the verification set as the verification set and the other half as the test set.
3. The system of claim 1, wherein the feature extraction and dictionary construction is: counting all words in the training set obtained by division, taking the words with the word frequency higher than the word frequency threshold and contained in the used pre-training word vector Glove as a set, and adding words additionally<UNK>、<PAD>、<S>、</S>The unknown word, the filling word, the initial character and the end character are used as the dictionary of the embodiment, and then the context sequence, the question sequence and the answer sequence are converted into subscripts in the dictionary set which are respectively marked as Wc、WqAnd WansAnd simultaneously, the sequence obtained by carrying out named body recognition on the context sequence by using a space toolkit is marked as WnerAnd the sequence obtained by part-of-speech tagging is marked as WposFinally, the prototype or entry W of the non-stop word appearing in the question in the context is labeledemerge
4. The system of claim 1, wherein said vectorization is: converting the context sequence and the answer sequence obtained after the construction of the dictionary and the characteristics into a subscript W in the dictionary setcAnd WansUsing Glove pre-training word vector to carry out vectorization, and identifying a sequence W obtained by a named bodynerSequence W obtained by word taggingposPerforming random vectorization toTagging a prototype or word item W of a non-stop word appearing in a question in contextemergeAnd the problem sequence is converted into a subscript W in the dictionary setqAnd the real labels are respectively used as a context sequence labeling module and a question generating module.
5. The system of claim 1, wherein said separate two-input encoder comprises: context encoder and answer encoder, which respectively pass context and answer sequence through the two different two-layer bidirectional LSTM encoders to respectively obtain two vectors Sc,SaAs context state vector and answer state vector:
Figure RE-FDA0002618502130000021
Figure RE-FDA0002618502130000022
wherein: giPre-training word vectors for Glove; f. ofiFor additional feature information, in a context encoder fiContaining the named entity and part-of-speech information, taking into account only the pre-training word vectors in the answer encoder, fiIs empty; the arrow symbol indicates the direction of the recurrent neural network; [;]representing the connection of the two vector final dimensions;
in order for the context encoder to sense the current answer information, the embodiment uses an attention mechanism to fuse the answer information into the context to obtain a fusion matrix H, where the general expression of the attention mechanism is as follows: attention (Q, K, V) ═ softmax ((W)Q·Q)·(WK·K)T)·(WVV), wherein: h ═ Attention (S)a,Sc,Sc) Wherein: wQ,WK
Figure RE-FDA0002618502130000023
Is a trainable parameter matrix.
6. The system of claim 1, wherein said feedforward networkThe network structure is a fully connected network comprising: two linear transformations, one ReLU activation function, residual structure and Layer normalization (Layer normalization), specifically: ffn (h) ═ LayerNorm (Relu (hW)1+b1)W2+b2+h)W3Wherein: two one-dimensional convolutions are used for realizing linear transformation, the input and output of the convolutions are all 600 dimensions, the middle dimension is 2400 dimensions, and the dimensions of the parameter matrix are respectively
Figure RE-FDA0002618502130000024
Figure RE-FDA0002618502130000025
The probability of each tag of the context sequence is finally obtained.
7. The system of claim 1, wherein the CRF structure is configured to obtain transition probabilities between tags, and the loss function is a negative log-likelihood function
Figure RE-FDA0002618502130000026
Figure RE-FDA0002618502130000027
Wherein: x denotes the input sequence, y denotes the true sequence tag, y*Representing predicted likely sequence tags, Score (x, y) indicating scores of true annotated sequences, and summing the scores representing all possible predicted sequences, in a manner that uses the path scores of each step for summation, thereby significantly reducing the amount of computation.
8. The system of claim 1, wherein the encoder of the self-attention mechanism is a bidirectional LSTM network, and the input is passed through the LSTM network to obtain the state vector S, because of the problem of gradient disappearance caused by LSTM long-range dependence, this embodiment uses a self-attention mechanism with a gate structure, specifically: obtaining the attention intermediate state matrix H by using the attention mechanism, and then using the gateThe structure screens information of S and H, and a residual structure is used for facilitating gradient transmission and avoiding information loss, so that a final state M is obtained, wherein H is Attention (S, S, S), and M is sigmoid (W)G[S;H]⊙tanh(WE·[S;H]) + S, wherein: (S) the first step of the method,
Figure RE-FDA0002618502130000028
WG
Figure RE-FDA0002618502130000029
a trainable parameter matrix, an indicates multiplication of the corresponding element of the matrix.
9. The system of claim 1, wherein the decoder comprises a two-layer one-way LSTM structure, wherein the final state M of the focus encoder output is focused during decoding and the state vector S of the answer encoder output in the two-input encoder of the context sequence labeling module is focusedaTherefore, the loss caused by high-level abstract information is made up, and the method specifically comprises the following steps: contextual attention vector
Figure RE-FDA0002618502130000031
Answer attention vector
Figure RE-FDA0002618502130000032
LSTM state transition equation
Figure RE-FDA0002618502130000033
Wherein:
Figure RE-FDA0002618502130000034
the Glove pre-training word vector for the previous word of the current predicted sequence,
Figure RE-FDA0002618502130000035
output h for the sequence of the previous wordt-1The resulting fused vector after attention with encoder output M,
Figure RE-FDA0002618502130000036
output h for the sequence of the previous wordt-1And answer state vector SaAnd obtaining an attention vector containing answer information after performing attention.
10. The system of claim 1, wherein said pointer network first indexes ht
Figure RE-FDA0002618502130000037
Connecting as input to the linear layer, the probabilities P of all words in the dictionary are outputgenThen directly using h in the decodertWeight on M attention as probability P of context word replicationcopy(ii) a Then by regulating PcopyAnd PgenIn a gate structure to obtain a specific output Pfinal=GcopyPcopy+(1-Gcopy)PgenWherein:
Figure RE-FDA0002618502130000038
Gcopyprobability of a gate structure.
11. A problem-oriented two-stage problem generation method based on the system of any one of the preceding claims, comprising:
the first stage, based on the LSTM-CRF network, of which the inputs are a separate context encoder and answer encoder, notes in context the words that may be included in the question, where: the context encoder outputs an attribute to the output of the answer encoder to fuse answer information to obtain a fusion matrix H, and finally a sequence mark of a context is generated through a feedforward structure;
the second stage, vectorizing the sequence mark generated in the first stage, connecting the sequence mark with a fusion matrix H as the input of an encoder, and generating the output of the encoder by using a self-attention mechanism with a gate structure in order to promote the information fusion of the context of the long text; in the decoding process, the output of the encoder and the output of the answer encoder in the first stage are simultaneously paid attention to, and a copying mechanism is used for copying the words in the context, so that the problem-oriented two-stage problem is finally obtained.
CN202010661187.5A 2019-11-27 2019-11-27 Two-stage problem generating system with problem as guide Active CN111813913B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010661187.5A CN111813913B (en) 2019-11-27 2019-11-27 Two-stage problem generating system with problem as guide

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010661187.5A CN111813913B (en) 2019-11-27 2019-11-27 Two-stage problem generating system with problem as guide
CN201911179784.8 2019-11-27

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201911179784.8 Division 2019-11-27 2019-11-27

Publications (2)

Publication Number Publication Date
CN111813913A true CN111813913A (en) 2020-10-23
CN111813913B CN111813913B (en) 2024-02-20

Family

ID=72846745

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010661187.5A Active CN111813913B (en) 2019-11-27 2019-11-27 Two-stage problem generating system with problem as guide

Country Status (1)

Country Link
CN (1) CN111813913B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380843A (en) * 2020-11-18 2021-02-19 神思电子技术股份有限公司 Random disturbance network-based open answer generation method
CN112668338A (en) * 2021-03-22 2021-04-16 中国人民解放军国防科技大学 Clarification problem generation method and device and electronic equipment
CN112819787A (en) * 2021-02-01 2021-05-18 清华大学深圳国际研究生院 Multi-light source prediction method
CN113128206A (en) * 2021-04-26 2021-07-16 中国科学技术大学 Question generation method based on word importance weighting
CN113268564A (en) * 2021-05-24 2021-08-17 平安科技(深圳)有限公司 Method, device and equipment for generating similar problems and storage medium
CN116681087A (en) * 2023-07-25 2023-09-01 云南师范大学 Automatic problem generation method based on multi-stage time sequence and semantic information enhancement

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180329883A1 (en) * 2017-05-15 2018-11-15 Thomson Reuters Global Resources Unlimited Company Neural paraphrase generator
US20180349359A1 (en) * 2017-05-19 2018-12-06 salesforce.com,inc. Natural language processing using a neural network
CN109657041A (en) * 2018-12-04 2019-04-19 南京理工大学 The problem of based on deep learning automatic generation method
CN109684452A (en) * 2018-12-25 2019-04-26 中科国力(镇江)智能技术有限公司 A kind of neural network problem generation method based on answer Yu answer location information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180329883A1 (en) * 2017-05-15 2018-11-15 Thomson Reuters Global Resources Unlimited Company Neural paraphrase generator
US20180349359A1 (en) * 2017-05-19 2018-12-06 salesforce.com,inc. Natural language processing using a neural network
CN109657041A (en) * 2018-12-04 2019-04-19 南京理工大学 The problem of based on deep learning automatic generation method
CN109684452A (en) * 2018-12-25 2019-04-26 中科国力(镇江)智能技术有限公司 A kind of neural network problem generation method based on answer Yu answer location information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
任智慧;徐浩煜;封松林;周晗;施俊: "基于LSTM网络的序列标注中文分词法", 计算机应用研究, vol. 34, no. 5 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380843A (en) * 2020-11-18 2021-02-19 神思电子技术股份有限公司 Random disturbance network-based open answer generation method
CN112819787A (en) * 2021-02-01 2021-05-18 清华大学深圳国际研究生院 Multi-light source prediction method
CN112819787B (en) * 2021-02-01 2023-12-26 清华大学深圳国际研究生院 Multi-light source prediction method
CN112668338A (en) * 2021-03-22 2021-04-16 中国人民解放军国防科技大学 Clarification problem generation method and device and electronic equipment
US11475225B2 (en) 2021-03-22 2022-10-18 National University Of Defense Technology Method, system, electronic device and storage medium for clarification question generation
CN113128206A (en) * 2021-04-26 2021-07-16 中国科学技术大学 Question generation method based on word importance weighting
CN113268564A (en) * 2021-05-24 2021-08-17 平安科技(深圳)有限公司 Method, device and equipment for generating similar problems and storage medium
CN113268564B (en) * 2021-05-24 2023-07-21 平安科技(深圳)有限公司 Method, device, equipment and storage medium for generating similar problems
CN116681087A (en) * 2023-07-25 2023-09-01 云南师范大学 Automatic problem generation method based on multi-stage time sequence and semantic information enhancement
CN116681087B (en) * 2023-07-25 2023-10-10 云南师范大学 Automatic problem generation method based on multi-stage time sequence and semantic information enhancement

Also Published As

Publication number Publication date
CN111813913B (en) 2024-02-20

Similar Documents

Publication Publication Date Title
CN111813913B (en) Two-stage problem generating system with problem as guide
Logeswaran et al. Sentence ordering and coherence modeling using recurrent neural networks
CN111738003B (en) Named entity recognition model training method, named entity recognition method and medium
Erdem et al. Neural natural language generation: A survey on multilinguality, multimodality, controllability and learning
CN111966812A (en) Automatic question answering method based on dynamic word vector and storage medium
Touati-Hamad et al. Arabic quran verses authentication using deep learning and word embeddings
Vitiugin et al. Emotion Detection for Spanish by Combining LASER Embeddings, Topic Information, and Offense Features.
Kondurkar et al. Modern Applications With a Focus on Training ChatGPT and GPT Models: Exploring Generative AI and NLP
Wang et al. Augmentation with projection: Towards an effective and efficient data augmentation paradigm for distillation
CN111813907A (en) Question and sentence intention identification method in natural language question-answering technology
Popattia et al. Guiding attention using partial-order relationships for image captioning
Khan et al. Pretrained natural language processing model for intent recognition (bert-ir)
Gormley Graphical models with structured factors, neural factors, and approximation-aware training
Aggarwal et al. GPTs at Factify 2022: Prompt aided fact-verification
Kreyssig Deep learning for user simulation in a dialogue system
Schick Few-shot learning with language models: Learning from instructions and contexts
Phade et al. Question Answering System for low resource language using Transfer Learning
Bensghaier et al. Investigating the Use of Different Recurrent Neural Networks for Natural Language Inference in Arabic
Xia Natural Language Understanding for Conversational Agents
Yolchuyeva Novel NLP Methods for Improved Text-To-Speech Synthesis
Shafiq et al. Enhancing Arabic Aspect-Based Sentiment Analysis Using End-to-End Model
Cao et al. Predict, pretrained, select and answer: Interpretable and scalable complex question answering over knowledge bases
Kulkarni et al. Deep Reinforcement-Based Conversational AI Agent in Healthcare System
Saeedi et al. Reusable Toolkit for Natural Language Processing in an Ambient Intelligence Environment
Sisodia Semantic Textual Similarity on Contracts: Exploring Multiple Negative Ranking Losses for Sentence Transformers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant