CN112836482B

CN112836482B - Method and device for generating problem by sequence generation model based on template

Info

Publication number: CN112836482B
Application number: CN202110181755.6A
Authority: CN
Inventors: 李玉娥; 董黎刚; 蒋献; 吴梦莹; 诸葛斌
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2021-02-09
Filing date: 2021-02-09
Publication date: 2024-02-23
Anticipated expiration: 2041-02-09
Also published as: CN112836482A

Abstract

The invention discloses a method and a system for generating a problem by a sequence generation model based on a template, wherein a text extraction model is constructed to classify texts for generating the problem, so as to obtain a predicted text; constructing a text recognition model, converting a text into a vector representation by the text recognition model, obtaining a word vector and an input sequence corresponding to the word vector based on the vector representation and a word level and sentence level attention mechanism arranged in the model, and classifying based on the input sequence to obtain a prediction relation label of a predicted text; and constructing a sequence generation model, wherein the coding unit receives the problem, maps the problem into a multivariate vector, inputs the multivariate vector to the template decoding unit, and replaces the problem template output by the decoding unit according to the prediction and the prediction relation label.

Description

Method and device for generating problem by sequence generation model based on template

Technical Field

The invention relates to the technical field of artificial intelligence natural language processing, in particular to a method and a device for generating a sequence generation model based on a template.

Background

At present, a great part of research on Chinese problem generation in the field of natural language processing is a knowledge graph constructed based on templates or rules, and due to the limitations of the templates and the rules, the generated problems are single in type and lack of flexibility in language.

Rule-based methods typically require a lot of manpower and time, and the generated problems typically have problems with discordance of sentences and mismatch with the article content. The problems generated by the template-based method are relatively dead, single in type and lack of language diversity, and the quality of the generated problems is directly determined by the quality of the templates. The problem generating method based on the sequence model only has the problems of ambiguous topic entity identification and the like, and the quality of the generated problem is affected.

Disclosure of Invention

In view of the above problems in the prior art, the invention designs a method and a device for generating a problem by a sequence generation model based on a template, which combine the methods based on the template and the sequence generation model together, and improve the quality of the generated problem when generating the related problem in the sequence generation model in a knowledge graph.

In order to achieve the above object, the present invention provides a method for generating a model generation problem based on a sequence of templates, comprising:

constructing a text extraction model: inputting text content for generating a problem into a joint model, and classifying the text by using the joint model to obtain a predicted text;

constructing a text relationship recognition model: acquiring text semantic feature vectors according to the predicted text content, and training the text relationship recognition model by using a training set with a relationship tag, wherein the text semantic feature vectors are trained according to a word level and a sentence level attention mechanism set in the text relationship recognition model, so that input sequences corresponding to word vectors and word vectors can be obtained, and the word vectors are classified according to the corresponding input sequences to obtain a predicted relationship tag of the predicted text;

the method comprises the steps of constructing a sequence generation model, wherein the sequence generation model comprises a coding unit and a template decoding unit, inputting a problem to the coding unit, mapping the problem into a corresponding multi-element group vector according to the problem semantics, inputting the multi-element group vector into the template decoding unit in sequence, outputting the multi-element group vector as a problem template by the template decoding unit, and replacing the problem template according to the prediction relation label.

Optionally, the text relationship recognition model comprises an input representation layer, a word level layer, a sentence level layer and an entity relationship classification layer, wherein word vectors and word vector sequences corresponding to text semantic information are used for identifying the text semantic information,

the input layer is used for converting the input words into a representation of vectors and acquiring text semantic feature vectors;

the word level layer learns the content information for learning the text context to obtain the importance degree of each word on the text semantic information;

the sentence-level learning is used for distributing different weights to each output word according to the upper and lower Wen Yugou, and acquiring the importance degree of the word to sentence information;

the entity relation classification layer performs normalization processing on the importance degree of the sentence information to obtain a relation label of the vector, so as to classify the relation among the entities.

Optionally, the step of constructing the text extraction model includes:

the combined model is a combined structure of a two-way long and short memory network model and a conditional random field model, the text extraction model comprises three layers of structures of word vector representation, sentence characteristic extraction and sentence-level sequence labeling,

wherein the text data is sequence-labeled to obtain training set text,

reading the training set text as input of a two-way long and short memory network model to perform unsupervised training, so that the two-way long and short memory network model initializes weights of the training set text and builds a feature space;

based on the feature space and the weight of the text, performing supervised learning on the training set text by using a conditional random field model;

obtaining the classification probability of each word in the training set text by using a normalization function;

and classifying by using the obtained classification probability to obtain the predicted text.

Optionally, the step of building the text relationship recognition model includes:

and inputting the predicted text content into a pre-trained word bag model to convert the text into a low-dimensional dense vector representation, wherein the vector representation is a text semantic feature vector, and inputting the text semantic feature vector into a word level learning layer for constructing a text relationship recognition model to obtain word sense information, word sense information and context information contained in the text semantic feature vector.

Optionally, the step of constructing the text relationship recognition model further includes:

after obtaining text semantic feature vectors, inputting the text semantic feature vectors into a sentence level learning layer for constructing a text relationship recognition model, obtaining the weight of each word of the text semantic feature vectors, and obtaining the value of each word according to a weighted average value method;

and carrying out normalization processing on the attribute value, obtaining a prediction relation label of the text semantic feature vector based on the normalization processing, and classifying the prediction text according to the prediction relation label to obtain a sentence entity.

Optionally, the predictive relationship label adopts 16 labels and 5 custom labels defined by HowNet.

Optionally, the mapping of the input questions to corresponding multi-element vector content includes a question topic, a question relationship, and a question, wherein the received question is posed according to the topic entity and the entity relationship, and can be answered by the entity.

The embodiment of the invention provides a device for generating a problem based on a template sequence model, which comprises the following steps:

text extraction model module: the method comprises the steps of inputting text content for generating a problem into a joint model, and classifying the text by using the joint model to obtain a predicted text;

a text relationship recognition model module: the text semantic feature vector is obtained according to the predicted text content, the text relation recognition model is trained by using a training set with relation labels, wherein the text semantic feature vector is trained according to a word level and sentence level attention mechanism set in the text relation recognition model, input sequences corresponding to word vectors and word vectors can be obtained, the word vectors and the word vectors are classified according to the corresponding input sequences, and the predicted relation labels of the predicted text are obtained;

a sequence generation model module: the sequence generation model comprises an encoding unit and a template decoding unit, a problem is input to the encoding unit, the problem is mapped into a corresponding multi-element group vector according to the problem semantics, the multi-element group vector is sequentially input into the template decoding unit, the output of the template decoding unit is used as a problem template, and the problem template is replaced according to the prediction relation label.

An embodiment of the present invention provides a computer device including a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any one of the above

Compared with the prior art, the method for generating the problem based on the template and sequence combined model has the following beneficial effects:

and constructing a text extraction model, and classifying the text for generating the problem according to a two-way long and short memory network model and a conditional random field model which are arranged in the text extraction model to obtain a predicted text. And then, carrying out relation recognition on the predicted text, and adding a text relation model of word level and sentence level to enable the classification of the predicted text to be more accurate, so that the obtained predicted relation label is more in line with text semantics, and the problems of unsmooth sentences and mismatching with text contents are solved. The method comprises the steps of inputting questions to a coding unit in a sequence generation model, outputting the questions to a template decoding unit in the sequence generation model as question templates, and replacing the question templates in the sequence generation model according to predictive relation labels obtained in a text recognition model, namely replacing a subject entity in the output template by any label, wherein the generated questions are flexible in types and rich in language types.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

This document provides an overview of various implementations or examples of the technology described in this disclosure, and is not a comprehensive disclosure of the full scope or all of the features of the disclosed technology.

Drawings

FIG. 1 is a schematic diagram of an entity relationship tag in an embodiment of the present invention;

FIG. 2 is a schematic diagram of formats used by the training set and the test set in an embodiment of the present invention;

FIG. 3 is a schematic diagram of a portion of an example of knowledge-graph-based generation in an embodiment of the invention;

FIG. 4 is a schematic diagram of the experimental results of knowledge-based problem generation in the practice of the present invention;

FIG. 5 is a schematic diagram of an overall framework of the entity extraction function in an embodiment of the present invention;

FIG. 6 is a schematic diagram of the overall structure of the entity extraction model structure according to the embodiment of the invention;

FIG. 7 is a schematic diagram of an entity relationship identification function framework in an embodiment of the present invention;

FIG. 8 is a schematic diagram of an entity relationship recognition model in an embodiment of the present invention;

FIG. 9 is a schematic representation of the results of a sequence-based generation model in the practice of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are within the scope of the present disclosure, based on the described embodiments of the present disclosure.

Unless defined otherwise, technical or scientific terms used in embodiments of the present disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present disclosure belongs. The use of the terms "comprising" or "including" and the like in embodiments of the present disclosure is intended to cover an element or article appearing before the term and equivalents thereof, which are listed after the term, without excluding other elements or articles.

In order to keep the following description of the embodiments of the present disclosure clear and concise, detailed descriptions of known functions and known components are omitted.

The embodiment of the invention provides a method for generating a model generation problem based on a sequence of templates, which specifically comprises the following steps:

constructing a text extraction model: and inputting the text content for generating the problem into a joint model, and classifying the text by using the joint model to obtain a predicted text. The text extraction model comprises three layers of structures of word vector representation, sentence characteristic extraction and sentence-level sequence labeling.

The joint model is a joint structure of a two-way long and short memory network model (Bi-LSTMs) and a conditional random field model, sentence entities and entity context information are extracted based on the two-way long and short memory network, and the conditional random field is used for classifying the entities. And (3) carrying out sequence labeling on the data set by using a BIO labeling strategy, based on a Bi-directional long-short memory network (Bi-LSTMs), passing each word vector in the sentence through a layer of Bi-LSTMs network, splicing forward and reverse hidden state sequences of the Bi-LSTMs network, enabling the model to learn the context information of each word vector, carrying out supervised learning on the hidden state sequences output by the Bi-LSTMs network by using a conditional random field, and finally obtaining the classification probability of each word in the sentence by using a normalization function.

In the embodiment of the present invention, referring to fig. 5 and 6, the whole frame diagram of the entity extraction function divides text set data for a training part of entity extraction and a test part of entity extraction, and the conditional random field CRF layer, "B-PER", "O", "B-LOC" in the entity extraction model structure diagram is a labeling strategy, and 80% of the text data is used as a training set, and the remaining 20% is used as a test set.

In an embodiment of the present invention, referring to fig. 2, the formats used for the training set and the test set, including questions, triples corresponding to the questions, answers to the questions, are such that the final text extraction model has typical text characteristics.

Specifically, given a text, a data set is constructed-sequence labeled, because the text is a continuous sentence, and each word and word is obtained by word segmentation, so that the text can be structurally represented based on each word and word appearing in the text. And (3) performing sequence labeling on the obtained characters and words to obtain a training set text, wherein the training set text is not trained at the moment, and modeling is directly performed, namely, the training set text is used as input of a two-way long and short memory network to perform unsupervised training, namely, bi-LSTMs unsupervised learning is performed.

The weight of the training set text is initialized by using the two-way long and short memory network model, and the feature vector is constructed, wherein, because a general mathematical model receives only numerical value input, the text is converted into vector representation, and the vector representation comprises word vector and word vector, namely, the meaning that the meaning represented by one text comprises numerical values 1 and 0 when represented by the vector. The weight value of the training set text is understood to be that a keyword is found for the text content used for generating the problem, and can represent the importance degree of the keyword on the text, and the capability of predicting the text subject by the word is also reflected laterally. At this time, the space vector model is used to consider the text set as a plurality of vectors in the space vector, each word corresponds to a coordinate axis, and the coordinate axis is provided with a weight value of the corresponding word, so as to construct the feature space.

Based on the feature space and the weight of the text, the training set text is subjected to supervised learning by using a conditional random field model, namely, the CRF supervised learning in FIG. 5 is executed. Training is carried out through a training set text with a label to obtain an optimal model, the training set is mapped into corresponding output by using the model, and the output is judged to be classified, wherein the optimal selection of the training model is a conditional random field model, the training set text is segmented based on the conditional random field model, and the model for Chinese word segmentation is quite large, and is not repeated here.

Based on the steps, using a normalization function to obtain the classification probability of each word in the training set text; and classifying by using the obtained classification probability to obtain the predicted text.

Constructing a text relationship recognition model: the text semantic feature vector is obtained according to the predicted text content, the text relation recognition model is trained by utilizing a training set with relation labels, wherein the text semantic feature vector is trained according to a word level and sentence level attention mechanism arranged in the text relation recognition model, an input sequence corresponding to a word vector and a word vector can be obtained, the text is classified according to the corresponding input sequence, and the word vector corresponding to the text can be classified.

In the embodiment of the present invention, referring to fig. 7 and 8, the entity relationship recognition model is divided into a training part for entity relationship recognition and a test part for entity relationship recognition, 80% of text set data is used as a training set, and the remaining 20% is used as a test set.

Referring to fig. 2, for a training set and a format used by a test set according to an embodiment of the present invention, the triplet content set according to the input problem includes a subject entity, an entity relationship, and an entity. In other embodiments, the tuples mapped out according to the proposed problem include, but are not limited to, tuples, quads, etc.

Specifically, the predicted text output by the entity extraction model is constructed for the data set, and referring to fig. 8, the entity relationship recognition model is added with the BI-GIU of the word level AM and the BI-GIU of the sentence level AM, and besides the model also comprises a basic input representation layer and an entity relationship classification layer;

the entity classification relation layer specifically comprises an implicit layer, and the purpose of the implicit layer is to obtain a clearer and better classification effect.

the sentence-level learning is used for distributing different weights to each output word according to the upper and lower Wen Yugou, and obtaining the importance degree of the word to the sentence information.

Specifically, a Tensorflow framework is utilized to construct a combined structure of a bidirectional gate control unit layer and an AM, and a training data set with a relation label is utilized for training to construct an entity relation recognition model.

The training dataset format with relational tags refers to the format mentioned above for fig. 2 of the extraction model.

The step of constructing a text relationship recognition model comprises the following steps:

inputting a predicted text into a pre-trained Word2vec model, converting continuous words into low-dimensional dense vector representations, wherein the vectors are expressed as text semantic feature vectors, inputting the text semantic feature vectors into a Word level learning layer for learning, and obtaining Word sense information, word sense information and context information contained in the text semantic feature vectors;

and using a space vector model to consider a set of texts as a plurality of vectors in the space vector, wherein each word corresponds to a coordinate axis, the coordinate axis is provided with a weight value of the corresponding word, and the feature space is constructed.

The step of constructing a text relationship recognition model further comprises:

when obtaining text semantic feature vectors, to understand text semantics, it is necessary to obtain some sentence-level vectors as well as word vectors. The next sentence of the sentence vector is judged to be the current sentence or noise, namely, a Bi-GRU layer of sentence level AM is arranged in a text relation recognition model, the function of the Bi-GRU layer is to distribute different weights to each output word according to upper and lower Wen Yugou, the importance degree of words to sentence information is obtained, and the degree value of each word is obtained through a weighted average.

Normalizing the attribute values to obtain classification probability of each word in the training set text, obtaining a predictive relation tag of a text semantic feature vector based on the classification probability, classifying the predictive text according to the predictive relation tag by referring to fig. 1 to obtain a subject entity, wherein the error between the predictive relation tag and a real tag is minimized by adjusting parameters.

Taking the following text as an example: inputting text: "loved women usually lose great tragic, love is the best, and winner is the king. The emotion can be transferred, the marital can be frozen at any time, the marital is loved and put in, the marital is finally classified as 'female' through a text recognition model, namely, the sentence entity obtained by the text is 'female'. The input text is "logical relationship between data? "it is classified as" data "after passing through the text recognition model, i.e., the sentence entity of the segment is" data ".

Referring to fig. 1, the predictive relational tags adopt 16 tags and 5 custom tags defined by HowNet, and the training set format adopted is text 1, text 2, relational tags and sentences which appear together in text pairs.

Referring to fig. 9, a sequence generation model is constructed, the sequence generation model includes a coding unit and a template decoding unit, the coding unit receives a problem and maps the problem to a corresponding multi-element group vector, the multi-element group vector is sequentially input into the template decoding unit, the output of the template decoding unit is used as a problem template, and the problem template is replaced according to the prediction relation label.

In this embodiment, the encoding unit is a triplet encoder, based on which a triplet f= (subject entity T, entity relationship R, entity O) is input, expressed as a question posed to and having a relationship R with the subject entity T, and can be answered by the entity O, an example of the triplet being shown in fig. 2. Firstly, using a triplet encoder to map the triplet and words into a real-valued vector space, then inputting the vectors into a template decoding unit in sequence, and outputting the template decoding unit as a problem template of which the subject entity is replaced by any label.

The label is replaced with a specific topic entity T by a template decoding unit, thereby converting the question template into a complete question.

Referring to fig. 3, the actual problem is "what the concept of logical structure is", and based on the method of the present invention, the final generated problem is "what is the definition of S? The semantic prediction is accurate, S is understood to be a text subject, namely sentence entity obtained by classification, and S has data of predictive relation labels, namely S can sort the data in a pile to form a complete binary tree, and S content can be understood to comprise but is not limited to a logic structure, so that the integrity of the generated problem is ensured.

In order to compare model effects, the embodiment uses the same problem generation data set and experimental environment, designs a problem generation experiment based on a knowledge graph, and compares the correctness of problems generated by three problem generation methods. Comparing the model used in the present list with the model generated based on the template and the sequence, and using three evaluation criteria of BLEU measure, METEOR measure and ROUGE measure to measure the advantages and disadvantages of the three methods, wherein the comparison experimental result is shown in the table of figure 4 below, and the performance is optimal.

The invention also provides a device for generating the problem of the sequence generation model based on the template, which comprises:

a text relationship recognition model module: the text relation recognition model is trained by utilizing a training set with relation labels, wherein the text semantic feature vectors are trained according to a word level and sentence level attention mechanism set in the text relation recognition model, input sequences corresponding to word vectors and word vectors can be obtained, and the word vectors are classified according to the corresponding input sequences;

a sequence generation model module: the sequence generation model comprises an encoding unit and a template decoding unit, wherein the encoding unit receives a problem and maps the problem into corresponding multi-element vectors, the multi-element vectors are sequentially input into the template decoding unit, the output of the template decoding unit is used as a problem template, and the problem template is replaced according to a prediction relation label.

In one embodiment, a computer device is provided that includes a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the above-described method and corresponding system for generating a problem based on a template sequence model.

In one embodiment, a computer readable storage medium is provided, storing a computer program, which when executed by a processor, causes the processor to perform the steps of a method and corresponding system for generating a problem based on a template sequence model as described above.

The above embodiments are illustrative of the present invention, and not limiting, and any simple modifications of the present invention fall within the scope of the present invention.

Claims

1. A method for generating a model generation problem based on a sequence of templates is characterized in that,

constructing a text extraction model: inputting a text for generating a problem into a joint model, and classifying the text by using the joint model to obtain a predicted text;

the method comprises the steps of constructing a sequence generation model, wherein the sequence generation model comprises a coding unit and a template decoding unit, inputting a problem to the coding unit, mapping the problem into a corresponding multi-element group vector according to the problem semantics, inputting the multi-element group vector into the template decoding unit in sequence, outputting the multi-element group vector as a problem template by the template decoding unit, and replacing the problem template according to the prediction relation label;

the text extraction model building step comprises the following steps:

wherein the text data is sequence-labeled to obtain training set text,

2. The method of generating a model generation question based on a sequence of templates of claim 1, wherein the text relationship recognition model comprises an input representation layer, a word level layer, a sentence level layer, and an entity relationship classification layer, wherein,

3. The method for generating a model generation problem based on a sequence of templates according to claim 1, wherein the step of constructing a text relationship recognition model comprises:

and inputting the predicted text content into a pre-trained Word2vec model to convert the text into a vector representation with low dimension and density, wherein the vector representation is a text semantic feature vector, and inputting the text semantic feature vector into a Word level layer for constructing a text relationship recognition model to obtain Word sense information, word sense information and context information contained in the text semantic feature vector.

4. The method for generating a model generation problem based on a sequence of templates according to claim 1, wherein the step of constructing a text relationship recognition model further comprises:

and normalizing the attribute value to obtain a predictive relation tag of the text semantic feature vector, and classifying the predictive text according to the predictive relation tag to obtain a sentence entity.

5. The method for generating a model generation problem based on a sequence of templates according to claim 1, wherein the predictive relational tags use 16 tags and 5 custom tags defined by HowNet.

6. The method of generating a model generated question based on a sequence of templates as claimed in claim 1, wherein the content when mapping said input question into a corresponding multi-component vector comprises at least a subject entity, an entity relationship, an entity, wherein said input question is posed according to said subject entity and said entity relationship and can be answered by said entity.

7. An apparatus for generating a model generation problem based on a sequence of templates, comprising:

text extraction model module: inputting text content for generating a problem into a joint model based on a template sequence, and classifying the text by using the joint model to obtain a predicted text;

a sequence generation model module: the sequence generation model comprises an encoding unit and a template decoding unit, wherein the encoding unit is used for inputting questions, the questions are mapped into corresponding multi-element group vectors according to the semantics of the questions, the multi-element group vectors are sequentially input into the template decoding unit, the output of the template decoding unit is used as a question template, and the question template is replaced according to the prediction relation label;

the text extraction model building step comprises the following steps:

wherein the text data is sequence-labeled to obtain training set text,

8. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1-6.

9. A computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, perform the method of any of claims 1-6.