CN112836482A

CN112836482A - Method and device for generating problems by sequence generation model based on template

Info

Publication number: CN112836482A
Application number: CN202110181755.6A
Authority: CN
Inventors: 李玉娥; 董黎刚; 蒋献; 吴梦莹; 诸葛斌
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2021-02-09
Filing date: 2021-02-09
Publication date: 2021-05-25
Anticipated expiration: 2041-02-09
Also published as: CN112836482B

Abstract

The invention discloses a method and a system for generating problems by a sequence generation model based on a template, wherein a text extraction model is constructed to classify texts for generating the problems to obtain predicted texts; constructing a text recognition model, converting a text into vector representation by the text recognition model, obtaining input sequences corresponding to word vectors and word vectors based on the vector representation and word level and statement level attention mechanisms arranged in the text recognition model, and classifying based on the input sequences to obtain a prediction relation label of a predicted text; and constructing a sequence generation model, receiving the problem by the coding unit, mapping the problem into a multi-element vector, inputting the multi-element vector to the template decoding unit, and replacing the problem template output by the decoding unit according to the prediction relation label.

Description

Method and device for generating problems by sequence generation model based on template

Technical Field

The invention relates to the technical field of artificial intelligence natural language processing, in particular to a method and a device for generating a problem of a sequence generation model based on a template.

Background

At present, most of research on Chinese problem generation in the field of natural language processing is knowledge graph constructed based on template or rule-based model, and due to the limitations of the template and the rule, the generated problem is single in type and language is lack of flexibility.

Rule-based methods typically require a lot of effort and time, and the resulting problems typically involve discontent sentences that do not match the content of the article. The problems generated by the template-based method are rigid, the types are single, the language diversity is lacked, and the quality of the generated problems is directly determined by the quality of the templates. The problem generation method based on the sequence model has the problems of unclear identification of a subject entity and the like, and also influences the quality of the generation problem.

Disclosure of Invention

In view of the problems in the prior art, the invention designs a method and a device for generating problems by using a sequence generation model based on a template, which combines the method based on the template and the method based on the sequence generation model together and improves the quality of the generation problems when generating related problems in the sequence generation model in a knowledge graph.

In order to achieve the above object, the present invention provides a method for generating a problem based on a sequence generation model of a template, comprising:

constructing a text extraction model: inputting text content for generating a problem into a joint model, and classifying the text by using the joint model to obtain a predicted text;

constructing a text relation recognition model: acquiring a text semantic feature vector according to the predicted text content, training the text relation recognition model by using a training set with relation labels, wherein the text semantic feature vector is trained according to word-level and statement-level attention mechanisms set in the text relation recognition model to obtain input sequences corresponding to word vectors and word vectors, and the word vectors are classified according to the corresponding input sequences to acquire the predicted relation labels of the predicted text;

and constructing a sequence generation model, wherein the sequence generation model comprises an encoding unit and a template decoding unit, inputting a problem to the encoding unit, mapping the problem into corresponding multi-tuple vectors according to the problem semantics, sequentially inputting the multi-tuple vectors into the template decoding unit, taking the output of the template decoding unit as a problem template, and replacing the problem template according to the prediction relation label.

Optionally, the text relationship recognition model includes an input representation layer, a word level layer, a sentence level layer, and an entity relationship classification layer, where word vectors and word vector sequences corresponding to text semantic information are included, where,

the input layer is used for converting input characters into vector representation and acquiring text semantic feature vectors;

the word level layer learns the content information used for learning the text context to obtain the importance degree of each word to the text semantic information;

the sentence level layer learning is used for distributing different weights to each output word according to the context sentences to obtain the importance degree of the words to the sentence information;

and the entity relation classification layer performs normalization processing on the importance degree of the statement information to obtain a relation label of a vector, so that the relation between the entities is classified.

Optionally, the step of constructing the text extraction model includes:

the combined model is a combined structure of a two-way long and short memory network model and a conditional random field model, the text extraction model comprises a three-layer structure of word vector representation, sentence characteristic extraction and sentence-level sequence labeling,

wherein the text data is subjected to sequence labeling to obtain a training set text,

reading the training set text as the input of a bidirectional long and short memory network model to perform unsupervised training, so that the bidirectional long and short memory network model initializes the weight of the training set text and constructs a feature space;

based on the feature space and the weight of the text, carrying out supervised learning on the text in the training set by using a conditional random field model;

obtaining the classification probability of each word in the text of the training set by using a normalization function;

and classifying by using the obtained classification probability to obtain the prediction text.

Optionally, the step of building a text relationship recognition model includes:

inputting the predicted text content into a pre-trained bag-of-words model to convert characters into low-dimensional dense vector representation, wherein the vector representation is a text semantic feature vector, inputting the text semantic feature vector into a word level learning layer for constructing a text relation recognition model, and obtaining word meaning information, word meaning information and context information contained in the text semantic feature vector.

Optionally, the step of constructing the text relationship recognition model further includes:

after the text semantic feature vector is obtained, inputting the text semantic feature vector into a sentence level learning layer for constructing a text relation recognition model, obtaining the weight of each word of the text semantic feature vector, and obtaining the entry value of each word according to a weighted average method;

and normalizing the attention value, obtaining a prediction relation label of the text semantic feature vector based on the normalization, and classifying the prediction text according to the prediction relation label to obtain a statement entity.

Optionally, the prediction relationship tag adopts 16 tags and 5 custom tags defined by HowNet.

Optionally, the input question is mapped into corresponding multi-element vector content including a question topic, a question relationship, and a question, wherein the received question is proposed according to the topic entity and the entity relationship, and can be answered by an entity.

The embodiment of the invention provides a device for generating problems based on a template sequence model, which comprises:

a text extraction model module: the system comprises a joint model and a prediction text, wherein the joint model is used for inputting text contents used for generating problems into the joint model and classifying the texts by utilizing the joint model to obtain a prediction text;

a text relation recognition model module: the text relation recognition model is trained by utilizing a training set with relation labels, wherein the text semantic feature vectors are trained according to word-level and statement-level attention mechanisms set in the text relation recognition model, input sequences corresponding to the word vectors and the word vectors can be obtained, the word vectors and the word vectors are classified according to the corresponding input sequences, and the prediction relation labels of the prediction texts are obtained;

a sequence generation model module: the sequence generation model comprises an encoding unit and a template decoding unit, a problem is input to the encoding unit, corresponding multi-tuple vectors are mapped according to the problem semantics and are sequentially input to the template decoding unit, the output of the template decoding unit is used as a problem template, and the problem template is replaced according to the prediction relation label.

An embodiment of the present invention provides a computer device, including a memory and a processor, where the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to execute the steps of any one of the above methods

Compared with the prior art, the method for generating the problem based on the template and sequence combined model has the following beneficial effects:

and constructing a text extraction model, and classifying the texts for generating the problems according to a two-way long and short memory network model and a conditional random field model which are arranged in the text extraction model to obtain predicted texts. And then, relation recognition is carried out on the predicted text, word-level and sentence-level text relation models are added, so that the classification of the predicted text is more accurate, the obtained predicted relation label is more consistent with text semantics, and the problems of sentence incompatibilities and text content mismatch are avoided. The problem is input to a coding unit in the sequence generation model, then the problem template is output by a template decoding unit in the generated sequence generation model, the problem template in the sequence generation model can be replaced according to the prediction relation label obtained in the text recognition model, namely, the subject entity in the output template can be replaced by any label, and the generated problem is flexible in type and rich in language type.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

This document provides an overview of various implementations or examples of the technology described in this disclosure, and is not a comprehensive disclosure of the full scope or all features of the disclosed technology.

Drawings

FIG. 1 is a schematic diagram of entity relationship tags in an embodiment of the present invention;

FIG. 2 is a diagram illustrating the format used by the training set and test set in an embodiment of the present invention;

FIG. 3 is a diagram of an example of a portion of knowledge-graph-based generation in an embodiment of the present invention;

FIG. 4 is a graphical illustration of experimental results generated based on knowledge-graph problems in the practice of the present invention;

FIG. 5 is a block diagram of an overall framework of an entity extraction function according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of an overall structure of an entity extraction model structure according to an embodiment of the present invention;

FIG. 7 is a functional framework diagram for entity relationship identification in an embodiment of the present invention;

FIG. 8 is a diagram illustrating an entity relationship recognition model according to an embodiment of the present invention;

FIG. 9 is a graphical representation of the results of a sequence-based generation model in the practice of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described below clearly and completely with reference to the accompanying drawings of the embodiments of the present disclosure. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are within the scope of protection of the disclosure.

Unless otherwise defined, technical or scientific terms used in the embodiments of the present disclosure should have the ordinary meaning as understood by those having ordinary skill in the art to which the present disclosure belongs. The use of the terms "comprising" or "including" and the like in the embodiments of the present disclosure is intended to mean that the elements or items listed before the term cover the elements or items listed after the term and their equivalents, without excluding other elements or items.

To maintain the following description of the disclosed embodiments clear and concise, detailed descriptions of known functions and known components are omitted from the disclosed embodiments.

The embodiment of the invention provides a method for generating a problem by a sequence generation model based on a template, which specifically comprises the following steps:

constructing a text extraction model: and inputting the text content for generating the problem into the combined model, and classifying the text by using the combined model to obtain a predicted text. The text extraction model comprises three layers of word vector representation, sentence characteristic extraction and sentence-level sequence labeling.

The combined model is a combined structure of a bidirectional long and short memory network model (Bi-LSTMs) and a conditional random field model, sentence entities and entity context information are extracted based on the bidirectional long and short memory network, and the entities are classified by using the conditional random field. The method comprises the steps of performing sequence labeling on a data set by using a BIO labeling strategy, based on a bidirectional long and short memory network (Bi-LSTMs), enabling each word vector in a sentence to pass through a layer of Bi-LSTMs network, splicing forward and reverse implicit state sequences of the Bi-LSTMs network, enabling a model to learn context information of each word vector, performing supervised learning on the implicit state sequences output by the Bi-LSTMs network by using a conditional random field, and finally obtaining the classification probability of each word in the sentence by using a normalization function.

In the embodiment of the present invention, referring to fig. 5 and fig. 6, the overall framework diagram of the entity extraction function is divided into a training part for entity extraction and a testing part for entity extraction, and text set data is divided, a conditional random field CRF layer in the entity extraction model structure diagram, "B-PER", "O", and "B-LOC" are labeling strategies, 80% of the text data is used as a training set, and the remaining 20% is used as a testing set.

In an embodiment of the present invention, referring to fig. 2, the format used for the training set and the test set includes questions, triples corresponding to the questions, and answers to the questions, so that the final text extraction model has typical text features.

In particular, given a text, a data set construction-sequence labeling is performed, because the text is a continuous sentence, which is participled to obtain each word and term, so that the text can be structurally represented based on each word and each term appearing in the text. And carrying out sequence marking on the obtained characters and words to obtain a training set text, and directly modeling the training set text without training, namely carrying out unsupervised training by taking the training set text as the input of the bidirectional long-short memory network, namely executing the unsupervised learning of Bi-LSTMs.

The method comprises the steps of initializing weights of texts in a training set and constructing a feature vector by using a two-way long and short memory network model, wherein because a general mathematical model receives only numerical value input, the texts are further converted into vector representations, and the vector representations comprise word vectors and word vectors, namely, the meaning represented by one text is understood to contain numerical values 1 and 0 when being represented by the vectors. The weight value of the text in the training set is understood to be that a keyword is found for the text content used for generating the question, which can represent the importance degree of the word to the text and also laterally reflect the capability of the word to predict the text theme. At this time, a space vector model is used, a text set is regarded as a plurality of vectors in the space vectors, each word corresponds to a coordinate axis, and the coordinate axis is provided with a weight value of the corresponding word to construct the feature space.

And (3) performing supervised learning on the text in the training set by using the conditional random field model based on the feature space and the weight of the text, namely performing the supervised learning on the CRF in the figure 5. Training is carried out through a training set text with a label to obtain an optimal model, the training set is mapped into corresponding output by utilizing the model, and the output is judged to realize classification, wherein the training model optimally selects a conditional random field model, the training set text is segmented based on the conditional random field model, and the training set text segmentation model has a lot of modelers for Chinese segmentation, and is not repeated here.

Based on the steps, obtaining the classification probability of each word in the text of the training set by using a normalization function; and classifying by using the obtained classification probability to obtain the predicted text.

Constructing a text relation recognition model: obtaining a text semantic feature vector according to the content of the predicted text, training the text relation recognition model by using a training set with relation labels, wherein the text semantic feature vector is trained according to word-level and statement-level attention mechanisms set in the text relation recognition model, so that input sequences corresponding to the word vector and the word vector can be obtained, and the text can be classified according to the corresponding input sequences, and can also be understood as being classified according to the word vector and the word vector corresponding to the text.

In the embodiment of the present invention, referring to fig. 7 and 8, the entity relationship recognition model is divided into a training part for entity relationship recognition and a testing part for entity relationship recognition, where 80% of the text set data is used as a training set, and the remaining 20% is used as a testing set.

Referring to fig. 2, a format used by a training set and a test set provided in an embodiment of the present invention includes a subject entity, an entity relationship, and an entity according to a triple content set by an input question. In other embodiments, the tuples mapped according to the proposed problem include, but are not limited to, dyads, quads, and the like.

Specifically, a prediction text output by the entity extraction model constructs a data set, and referring to fig. 8, the entity relationship recognition model is added with a BI-GIU of a word level AM and a BI-GIU of a sentence level AM, and further comprises a basic input representation layer and an entity relationship classification layer;

the entity classification relation layer specifically comprises a hidden layer, and the purpose of the hidden layer is to obtain a clearer and better classification effect.

the word level layer learns the content information for learning the text context to obtain the importance degree of each word to the text semantic information;

the sentence level layer learning is used for distributing different weights to each output word according to the context sentences to obtain the importance degree of the words to the sentence information.

Specifically, a joint structure of a bidirectional gate control unit layer and an AM is constructed by using a Tensorflow framework, and a training data set with a relationship label is used for training to construct an entity relationship recognition model.

The relational tagged training data set format refers to the format mentioned above for FIG. 2 of the extraction model.

The step of constructing the text relation recognition model comprises the following steps:

inputting a predicted text into a pre-trained Word2vec model to convert continuous characters into low-dimensional dense vector representation, wherein the vector representation is a text semantic feature vector, inputting the text semantic feature vector into a Word level learning layer for learning, and obtaining Word meaning information, Word meaning information and context information contained in the text semantic feature vector;

and (3) using a space vector model to regard the text set as a plurality of vectors in the space vector, wherein each word corresponds to a coordinate axis, and the coordinate axis is provided with a weight value of the corresponding word to construct the feature space.

The step of constructing the text relation recognition model further comprises the following steps:

when obtaining the text semantic feature vector, the text semantic should be understood, and not only word vectors but also some sentence-level vectors need to be obtained. The next sentence of the sentence vector is judged to be the current sentence or noise, namely, a Bi-GRU layer of a sentence level AM is arranged in a text relation recognition model, the function of the Bi-GRU layer is to distribute different weights to each output word according to context sentences, obtain the importance degree of the words to the sentence information, and obtain the attention value of each word through a weighted average value.

And normalizing the attention value to obtain the classification probability of each word in the text of the training set, obtaining a prediction relation label of the semantic feature vector of the text based on the classification probability, referring to the graph 1 by the relation label, classifying the prediction text according to the prediction relation label to obtain a subject entity, wherein the error between the prediction relation label and the real label is minimized by adjusting parameters.

Take the following text as an example: inputting a text: "women who understand love usually lose tragedy, love is originally intolerable, and the winner is the king. Emotion can transfer money, marriage can be frozen at any time, love is brought to the business, and the business is finally classified as female through a text recognition model, namely, the sentence entity obtained by the text is female. The input text is "logical relationship between data? "after passing through the text recognition model, it is classified as" data ", i.e. the sentence entity of the segment is" data ".

Referring to fig. 1, 16 kinds of labels and 5 kinds of custom labels defined by HowNet are adopted as the prediction relation labels, and the adopted training set format is a sentence in which a text 1, a text 2, a relation label and a text pair appear together.

And constructing a sequence generation model, referring to fig. 9, wherein the sequence generation model comprises an encoding unit and a template decoding unit, the encoding unit receives the problem, maps the problem into corresponding multi-tuple vectors, sequentially inputs the multi-tuple vectors into the template decoding unit, takes the output of the template decoding unit as a problem template, and replaces the problem template according to the prediction relation label.

In the embodiment, the encoding unit is a triple encoder, and based on the triple encoder, one triple F (subject entity T, entity relationship R, entity O) is input, which represents that a question having a relationship R with the subject entity T is proposed and can be answered by the entity O, and an example of the triple is shown in fig. 2. The method comprises the steps of firstly using a triple encoder to map a triple to a real-valued vector space, and then sequentially inputting vectors into a template decoding unit, wherein the output of the template decoding unit is a problem template of which a subject entity is replaced by any label.

The problem template is converted into a complete problem by replacing the tag with a specific subject entity T using a template decoding unit.

Referring to fig. 3, the practical question is "what is the concept of the logical structure", and based on the method of the present invention, the final generated question is "what is the definition of S? "semantic prediction is accurate, S should be understood as a text subject, that is, a sentence entity obtained by classification, and S has data with a prediction relationship tag, that is, S can perform heap sorting on the data to form a complete binary tree, and it can be understood that S content includes but is not limited to a logical structure, which ensures the integrity of a generated problem.

In order to compare the model effect, the same problem generation data set and experiment environment are used in the embodiment, a problem generation experiment based on a knowledge graph is designed, and the correctness of problems generated by three problem generation methods is compared. The model used in the scheme is compared with a model generated based on a template and a sequence, three evaluation standards of a BLEU measurement, a METEOR measurement and a ROUGE measurement are used for measuring the advantages and disadvantages of the problems generated by the three methods, and the comparison experiment result is shown in a table of figure 4 below and has the best performance.

The invention also provides a device for generating the problem based on the sequence generation model of the template, which comprises the following steps:

a text extraction model module: the system comprises a joint model, a prediction text and a text content database, wherein the joint model is used for inputting the text content used for generating a problem into the joint model and classifying the text by utilizing the joint model to obtain the prediction text;

a text relation recognition model module: the system comprises a text relation recognition model, a word-level and sentence-level attention mechanism, a word-level and sentence-level input sequence and a relation label, wherein the text relation recognition model is trained by utilizing a training set with relation labels, the word-level and sentence-level attention mechanism is set in the text relation recognition model, the input sequence corresponding to the word vector and the word vector can be obtained, and the word vector are classified according to the corresponding input sequence;

a sequence generation model module: the model for generating the sequence comprises an encoding unit and a template decoding unit, wherein the encoding unit receives the problem, maps the problem into corresponding multi-element vectors, sequentially inputs the multi-element vectors into the template decoding unit, takes the output of the template decoding unit as a problem template, and replaces the problem template according to the prediction relation label.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the above method of generating a problem based on a template sequence model and a corresponding system.

In one embodiment, a computer-readable storage medium is provided, storing a computer program, which when executed by a processor, causes the processor to perform the steps of a method and corresponding system for generating a problem based on a template sequence model as described above.

The above embodiments are illustrative of the present invention, and are not intended to limit the present invention, and any simple modifications of the present invention are within the scope of the present invention.

Claims

1. A method for generating a problem based on a sequence generation model of a template,

constructing a text extraction model: inputting a text for generating a problem into a joint model, and classifying the text by using the joint model to obtain a predicted text;

and constructing a sequence generation model, wherein the sequence generation model comprises an encoding unit and a template decoding unit, inputting a problem to the encoding unit, mapping the problem into corresponding multi-tuple vectors according to the problem semantics, sequentially inputting the multi-tuple vectors into the template decoding unit, outputting the multi-tuple vectors into a problem template by the template decoding unit, and replacing the problem template according to the prediction relation label.

2. The method of claim 1, wherein the textual relationship recognition model comprises an input representation layer, a word-level layer, a sentence-level layer, and an entity-relationship classification layer, wherein,

3. The method of claim 1, wherein constructing the text extraction model comprises:

4. The method of generating a model generation question based on a sequence of templates of claim 1, wherein the step of constructing a textual relationship recognition model comprises:

inputting the predicted text content into a pre-trained Word2vec model to convert characters into low-dimensional dense vector representation, wherein the vector representation is a text semantic feature vector, and inputting the text semantic feature vector into a Word level layer for constructing a text relation recognition model to obtain Word meaning information, Word meaning information and context information contained in the text semantic feature vector.

5. The method of generating a model generation question based on a sequence of templates of claim 1, wherein the step of constructing a textual relationship recognition model further comprises:

and normalizing the attention value to obtain a prediction relation label of the text semantic feature vector, and classifying the prediction text according to the prediction relation label to obtain a statement entity.

6. The method of claim 1, wherein the predictive relationship tags are 16 tags and 5 custom tags defined by HowNet.

7. The method of claim 1, wherein the content of the input question mapped to the corresponding multi-component vector comprises at least a subject entity, an entity relationship, and an entity, wherein the input question is proposed according to the subject entity and the entity relationship and can be answered by the entity.

8. An apparatus for generating a problem based on a sequence generative model of a template, comprising:

a text extraction model module: the method comprises the steps that text content used for generating a problem is input into a combined model based on a template sequence, and the text is classified by the combined model to obtain a predicted text;

9. A computer arrangement comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the method according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the method according to any one of claims 1-7.