CN110162613B

CN110162613B - Question generation method, device, equipment and storage medium

Info

Publication number: CN110162613B
Application number: CN201910447602.4A
Authority: CN
Inventors: 高一帆; 李丕绩
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-05-27
Filing date: 2019-05-27
Publication date: 2023-12-01
Anticipated expiration: 2039-05-27
Also published as: CN111414464A; CN110162613A; CN111414464B

Abstract

The application provides a problem generation method, a device, equipment and a storage medium; the method comprises the steps of carrying out coding processing on a first word vector, an answer information vector and a text position vector corresponding to the current question-answer round number corresponding to a reference text through a first coding model to obtain a first semantic vector sequence; encoding a second word vector corresponding to the historical question-answering text through a second encoding model to obtain a second semantic vector sequence; decoding the first semantic vector sequence and the second semantic vector sequence through a decoding model to obtain a problem text corresponding to the current number of rounds; and outputting the problem text. According to the application, the problems of continuity and good conversations can be generated by combining the historical conversational content.

Description

Question generation method, device, equipment and storage medium

Technical Field

The present application relates to the field of communications technologies, and in particular, to a problem generating method, an apparatus, and a storage medium.

Background

With the rapid development of artificial intelligence technology, the research content of natural language processing, which is one of the most important research fields of artificial intelligence, is also becoming more and more rich, including machine translation, automatic digest, problem generation, and the like, for example. The problem generation (Question Generation, QG) technology is a technology for automatically generating corresponding problems according to a section of text as an advanced search form in information search. The problem generation technology can be used for carrying out knowledge testing in an education scene, such as an intelligent teacher system (Intelligence Tutor System), and can actively present some problems aiming at reading and understanding to test the understanding of students to articles; meanwhile, the technology can be applied to chat robots and voice assistants, so that the chat system can actively raise problems to enhance the interactivity and persistence of conversations. In addition, the problem-generating technique can be applied to the medical field, and can be used for an automatic inquiry system, for example, for diagnosis by a dialogue with a patient.

The existing problem generation method mainly focuses on the problem generation of reading and understanding single sentences on texts, and frequently, the conversation is not consistent due to the fact that the prior conversation content cannot be considered, so that poor user experience is brought.

Disclosure of Invention

The embodiment of the application provides a problem generation method, a device and a storage medium, which can generate a coherent problem with good conversations by combining historical conversational content.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a problem generation method, which comprises the following steps:

encoding a first word vector, an answer information vector and a text position vector corresponding to the current question-answer round number corresponding to the reference text through a first encoding model to obtain a first semantic vector sequence;

encoding a second word vector corresponding to the historical question-answering text through a second encoding model to obtain a second semantic vector sequence;

decoding the first semantic vector sequence and the second semantic vector sequence through a decoding model to obtain a question text corresponding to the current question-answer round number;

and outputting the problem text.

An embodiment of the present application provides a problem generating apparatus, including:

The first coding module is used for coding a first word vector, an answer information vector and a text position vector corresponding to the current question-answer round number corresponding to the reference text through a first coding model to obtain a first semantic vector sequence;

the second coding module is used for coding a second word vector corresponding to the historical question-answering text through a second coding model to obtain a second semantic vector sequence;

the decoding module is used for decoding the first semantic vector sequence and the second semantic vector sequence through a decoding model to obtain a question text corresponding to the current question-answer number;

and the output module is used for outputting the problem text.

In the above solution, the apparatus further includes:

the training module is used for carrying out joint training on the decoding model at least according to the first optimization objective function and the second optimization objective function so as to adjust the parameters of the decoding model;

the first optimization objective function is used for focusing the attention distribution of the decoding model on entity nouns when pronouns are required to be generated, and the second optimization objective function is used for optimizing the chapter position attention distribution corresponding to each round of question-answer dialogue, so that the chapter position attention distribution of the decoding model is focused on texts corresponding to the current question-answer round number.

An embodiment of the present application provides a problem generating apparatus including:

a memory for storing executable instructions;

and the processor is used for realizing the method provided by the embodiment of the application when executing the executable instructions stored in the memory.

The embodiment of the application provides a storage medium which stores executable instructions for realizing the method provided by the embodiment of the application when being executed by a processor.

The embodiment of the application has the following beneficial effects:

when the method provided by the embodiment of the application is used for generating the problems, the related vectors in the reference text and the related vectors of the historical question-answering text are respectively encoded and then decoded to obtain the generated problems, and the generated problems can be better linked with the historical dialogue content due to the combination of the historical dialogue content; and the text position vector corresponding to the current question-answering wheel number is also included in the reference text related vector, so that the focus of the question can be concentrated on the text position vector corresponding to the current question-answering wheel number, and the generated question is more targeted.

Drawings

FIG. 1A is a schematic diagram of a related art gating index knowledge neural problem generation model;

FIG. 1B is a schematic diagram of a network architecture of a problem generating method according to an embodiment of the present application;

FIG. 1C is a schematic diagram of another network architecture of a problem generating method according to an embodiment of the present application;

FIG. 1D is a schematic diagram of another network architecture of a problem generating method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of an alternative terminal 400 according to an embodiment of the present application;

FIG. 3A is a schematic flow chart of an implementation of the problem generation method according to the embodiment of the present application;

FIG. 3B is a schematic flow chart of an implementation of the problem generating method according to the embodiment of the present application;

FIG. 4 is a schematic diagram of an implementation flow of obtaining word vectors of a reference text according to an embodiment of the present application;

fig. 5 is a schematic diagram of a system architecture of a neural network model of a problem generating method according to an embodiment of the present application.

Detailed Description

The present application will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

In the following description, the terms "first", "second", "third" and the like are merely used to distinguish similar objects and do not represent a specific ordering of the objects, it being understood that the "first", "second", "third" may be interchanged with a specific order or sequence, as permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

Before describing embodiments of the present application in further detail, the terms and terminology involved in the embodiments of the present application will be described, and the terms and terminology involved in the embodiments of the present application will be used in the following explanation.

1) Generating a problem, namely automatically generating a technology of a corresponding problem by a system according to a section of characters;

2) Conversational (conversation) refers to how well a sentence appears in a conversation.

3) A dialog flow (Conversation Flow) refers to a plurality of dialogs occurring sequentially with respect to a time axis, wherein a transition of a theme of the dialog may occur one or more times according to the time axis.

4) The intelligent teacher system is a system capable of automatically giving questions, evaluating the level of students and giving feedback in an educational scene;

5) The Loss Function (Loss Function), also known as cost Function, is a Function that maps the value of a random event or its related random variable to a non-negative real number to represent the "risk" or "Loss" of the random event. In application, the loss function is typically associated with an optimization problem as a learning criterion, i.e., solving and evaluating the model by minimizing the loss function. Parameter estimation, which is used for models, for example in statistics and machine learning, is an optimization objective for machine learning models.

6) Bilingual evaluation base (Bilingual Evaluation Understudy, BLEU), an evaluation criterion for measuring the quality of machine translation. BLEU is a weighted geometric average of the accuracy of N-grams (a collection of N words), the end result being the ratio of the number of N-grams correctly matched in the translated text of the machine translation to the number of occurrences of all N-grams therein.

7) Rouge (Recall-Oriented Understudy for Gisting Evaluation) is a set of metrics that evaluate automatic digest and machine translation. It calculates by comparing the automatically generated abstract or translation with a set of reference abstracts (typically manually generated), and derives a corresponding score to measure the "similarity" between the automatically generated abstract or translation and the reference abstracts.

L in Roug-L is the first letter of LCS (Longest Common Subsequence ) because Roug-L uses the longest common subsequence.

8) Attention (Attention) mechanisms determine the Attention profile of the decoder output decoding based on how well the current input sequence of the decoding model matches the output vector, i.e., the higher the degree of matching, i.e., the higher the relative score of the Attention concentration points.

9) Word vectors, also known as word embedding (word embedding) or word space embedded representations, are representations of natural language words in word space, meaning vectors that map words to a semantic space.

10 A cyclic neural network (Recurrent Neural Network, RNN) model, which may also be referred to as an encoder or encoder model, enables reading the entire source sequence as a fixed length code.

11 A decoding model, which may be referred to as a decoder or decoder model, or an RNN model, which may be various RNNs with control/Memory, such as RNNs based on Long Short-Term Memory (LSTM), transformer (transducer) models, RNNs based on gated loop units (Gate Recurrent Unit, GRU). The decoding model decodes the input sequence obtained after encoding to output a target sequence.

12 Attention distribution, the decoder model determines, by an attention mechanism, a distribution of probabilities of outputting individual words in the history question-answering text (individual words in the word sequence currently input to the decoder model) as decoding results when decoding each word vector of the semantic vector sequence to output a question. For example, when decoding the current word vector, the probability distributions of the outputs "he" and "Clinton" are (0.6,0.4)), the decoded attention distribution is focused on "he".

13 Chapter position attention distribution, and the decoder model takes a plurality of different texts as the probability distribution of the text where the correct answers of the current question-answering round number are located when decoding the output questions. For example, when texts 1, 2 and 3 exist, assuming that the probability distribution is (0.6,0.2,0.2), the chapter position attention distribution is focused on text 1.

In order to better understand the method provided in the embodiment of the present application, a description is first given of a gating instruction knowledge neural problem generation model for problem generation in the related art.

FIG. 1A is a schematic diagram of a gating refers to a knowledge neural problem generation model, as shown in FIG. 1A, including at least an Encoder (Encoder) 101 and a Decoder (Decoder) 102, that is, the problem generation is based on a neural network framework of Encoder-Decoder. In implementation, first, the encoder 101 converts a segment of english text from a discrete source word representation to a continuous word space embedded representation (word embedding), and simultaneously converts answer position information to corresponding answer features (answer features), and converts reference information to reference position features (coreference position feature), and then concatenates the three features into the encoder 101. The encoder 101 encodes it as a sequence of semantic vectors (h ₁ ，h ₂ …) and then input the encoded sequence of semantic vectors to the decoder 102. The decoder 102 reads in this semantic vector sequence, generating questions word by word through an attention mechanism (attention) and a recurrent neural network.

Wherein, the acquisition of the reference position features can be realized by the following steps:

Step 11, acquiring reference pairs (references) and corresponding confidence scores (confidence score) by means of the existing reference digestion tool.

Taking FIG. 1 as an example, the reference resolution tool associates the pronoun "the" in a sentence with the most probable reference noun "the Panthers" above, while obtaining a score (motion-pair score) for this reference pair.

Step 12, after the term (the theses) is inserted into the pronoun (the y), it is converted into the reference position feature f ^c ＝(c ₁ ，c ₂ ，…，c _n ) (co-reference position feature) to indicate whether or not a reference phenomenon has occurred at the current location.

Step 13, generating an improved reference location feature f by gating mechanism (gating) using the confidence scores of the reference pairs in step 11 ^d ＝(d ₁ ，d ₂ ，…，d _n ) (defined co-reference position feature), wherein:

d _i ＝c _i ⊙g _i ，g _i ＝MLP(c _i ，score _i ). Wherein, as follows, the product of the corresponding positions of the two vectors, MLP is the neural network of the Multi-Layer perceptron (Multi-Layer Perception).

The above-described technique has the following problems: 1) Generating a problem only can consider English text per se, and in a dialogue, modeling is not performed on the previous dialogue content, so that the problem is not consistent with dialogue history; 2) Without modeling for conversational flow, focus transitions between questions cannot be planned in testing students' understanding of one english text.

Based on the above, the embodiment of the application provides a problem generation method, which can combine the history dialogue content to generate the problem, thereby ensuring the consistency between question-answer dialogues; and when the problems are generated, text position information corresponding to the current number of rounds is also considered, so that focus transition among a plurality of problems can be realized.

An exemplary application of an apparatus implementing the embodiment of the present application is described below, where the apparatus provided in the embodiment of the present application may be implemented as a terminal, and may also be implemented as a server. In the following, exemplary applications covered when the apparatus is implemented as a terminal and a server, respectively, will be described.

Referring to fig. 1B, fig. 1B is a schematic diagram of a network architecture of a problem generating method according to an embodiment of the present application, in order to support an exemplary application, a communication connection is established between a first terminal 400 and a second terminal 100 through a network 300, where the network 300 may be a wide area network or a local area network, or a combination of the two, and a wireless link is used to implement data transmission. The problem generating method provided by the embodiment of the present application may be applied to an online education scenario, assuming that the second terminal 100 is a teacher terminal and the first terminal 400 is a student terminal, two first terminals 400-1 and 400-2 are exemplarily shown in fig. 1B.

Under the network architecture, the second terminal 100 may first send an article to the first terminal 400-1 and the first terminal 400-2, and after learning the article, the users corresponding to the first terminal 400-1 and the first terminal 400-2 may perform a test. At this time, the second terminal 100 may generate a question based on the article, send the question to the 400-1 and the first terminal 400-2, and after the answers of the users corresponding to the 400-1 and the first terminal 400-2, continue to generate the question according to the answer result of the users and the article, respectively.

Fig. 1C is a schematic diagram of another network architecture of a problem generating method according to an embodiment of the present application, to support an exemplary Application, a third terminal 500 is connected to a server 200 through a network 300, and the third terminal 500 may be an intelligent terminal, and an Application (App) capable of performing session chat may be installed on the intelligent terminal. The third terminal 500 may also be an intelligent chat robot. The network 300 may be a wide area network or a local area network, or a combination of both, using wireless links to effect data transmission.

The third terminal 500 may collect voice dialogue information of the user and itself, then send the collected voice dialogue information to the server 200, and the server 200 generates a question based on the voice dialogue information of the user, and sends the question to the third terminal 500, and the third terminal 500 outputs the question, for example, may output the question in a voice manner. In the subsequent session, the server 200 may continue to generate questions based on the previous session content, so that the session communication between the third terminal 500 and the user is consistent and smooth, thereby giving the user a good communication experience.

Fig. 1D is a schematic diagram of a network architecture of another problem generating method according to an embodiment of the present application, where, as shown in fig. 1D, the network architecture includes only a third terminal 500, and the third terminal 500 may be a smart phone, a tablet computer, a notebook computer, or the like, or may be a chat robot. A third terminal 500 is schematically shown in fig. 1D in the form of a chat robot. After collecting the voice dialogue information of the user, the third terminal 500 generates a question according to the collected voice dialogue information and outputs the question. In the subsequent session, the third terminal 500 may generate a question again based on the previous session content, so that the session communication between the third terminal 500 and the user is consistent and smooth, thereby giving the user a good communication experience.

The apparatus provided in the embodiments of the present application may be implemented in hardware or a combination of hardware and software, and various exemplary implementations of the apparatus provided in the embodiments of the present application are described below.

Referring to fig. 2, fig. 2 is an optional structural schematic diagram of a terminal 400 according to an embodiment of the present application, where the terminal 400 may be a mobile phone, a computer, a digital broadcasting terminal, an information transceiver device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, etc. according to the structure of the terminal 400, an exemplary structure when the apparatus is implemented as a server is foreseen, and thus the structure described herein should not be considered as a limitation, for example, some components described below may be omitted, or components not described below may be added to adapt to specific requirements of some applications.

The terminal 400 shown in fig. 2 includes: at least one processor 410, a memory 440, at least one network interface 420, and a user interface 430. Each of the components in terminal 400 are coupled together by a bus system 450. It is understood that bus system 450 is used to implement the connected communications between these components. The bus system 450 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled as bus system 450 in fig. 2.

The user interface 430 may include a display, keyboard, mouse, trackball, click wheel, keys, buttons, touch pad, touch screen, or the like.

Memory 440 may be volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM). The volatile memory may be random access memory (RAM, random Access Memory). The memory 440 described in embodiments of the present application is intended to comprise any suitable type of memory.

The memory 440 in the embodiment of the present application is capable of storing data to support the operation of the terminal 400. Examples of such data include: any computer programs for operating on the terminal 400, such as an operating system and application programs. The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application may comprise various applications.

As an example of implementation of the method provided by the embodiment of the present application by combining software and hardware, the method provided by the embodiment of the present application may be directly embodied as a combination of software modules executed by the processor 410, the software modules may be located in a storage medium, the storage medium is located in the memory 440, and the processor 410 reads executable instructions included in the software modules in the memory 440, and performs the method provided by the embodiment of the present application in combination with necessary hardware (including, for example, the processor 410 and other components connected to the bus 450).

By way of example, the processor 410 may be an integrated circuit chip having signal processing capabilities such as a general purpose processor, such as a microprocessor or any conventional processor, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

Methods of implementing embodiments of the present application will be described in conjunction with the foregoing exemplary applications and implementations of an apparatus implementing embodiments of the present application.

Referring to fig. 3A, fig. 3A is a schematic flow chart illustrating one implementation of the problem generating method according to the embodiment of the present application, where the problem generating method according to the embodiment of the present application may be applied to the second terminal 100 shown in fig. 1B, the server 200 shown in fig. 1C, and the third terminal 500 shown in fig. 1D, which implement the problem generating method by running a neural network model for problem generation.

In some embodiments, the problem-generating neural network model includes: the system comprises a first coding model, a second coding model and a decoding model, wherein the first coding model is used for coding a reference text to obtain a first semantic word vector, the second coding model is used for coding a historical dialogue text to obtain a second semantic word vector, and the decoding model is used for decoding the first semantic word vector and the second semantic word vector to obtain a generated problem.

The first coding model and the second coding model may be the same type of coding model, e.g. both RNN models, but the parameters of the first coding model and the second coding model may be different. The first encoding model may correspond to the chapter encoder 501 shown in fig. 5 and the second encoding model may correspond to the paragraph encoder 502 shown in fig. 5.

The problem generating method provided by the embodiment of the present application will be described below with reference to the steps shown in fig. 3A.

In step S201, a first word vector, an answer information vector, and a text position vector corresponding to the current question-answer number corresponding to the reference text are encoded by a first encoding model, so as to obtain a first semantic vector sequence.

Here, the first word vector may be a vector representation of all words in the reference text in a continuous word space, or a vector representation of words in the reference text that are not answers in a continuous word space.

The answer information vector is a vector representation of words marked as answers in the reference text in a continuous word space. In the embodiment of the application, the reference text may be divided into several parts in advance, for example, the reference text may be divided into sentences, each sentence or every two sentences may be a part, or the reference text may be divided into a part according to the number of words, and every ten words or fifteen words may be a part. Each portion has location information for the portion in the reference text, which may include a location of a first word in the portion in the reference text and a location of a last word in the reference text. And the question-answer wheel number has a corresponding relation with each part of the reference text, and can be the text content of one part corresponding to one question-answer wheel number or the text content of one part corresponding to a plurality of question-answer wheel numbers. The text position vector corresponding to the current question-answering wheel number is generated according to the text position information corresponding to the current question-answering wheel number.

When the step S201 is implemented, firstly, a first word vector, an answer information vector and a text position vector corresponding to the current question-answer number are obtained, the first word vector, the answer information vector and the text position vector corresponding to the current question-answer number are used as the input of a coding model, when the encoder encodes the first word vector, the answer information vector and the text position vector, the encoder firstly splices the first word vector, the answer information vector and the text position vector, then encodes the vector obtained after splicing, the input vector with the indefinite length obtained after splicing is transformed into an intermediate variable with a fixed length, and the sequence information of the vector obtained after splicing is encoded in the intermediate variable, so that a first semantic vector sequence is obtained.

In step S202, a second word vector corresponding to the historical question-answer text is encoded by a second encoding model, so as to obtain a second semantic vector sequence.

In some embodiments, the above step S202 may be implemented in such a way that: the second encoding model transforms the second word vector into a context variable having a fixed length and encodes the second word vector in the context variable, resulting in a second sequence of semantic vectors.

In some embodiments, before step S202, a second word vector corresponding to the historical question-answer text is acquired. When the method is implemented, first, original word vectors of words in the historical question-answering text are obtained, and then the original word vectors are mapped into second word vectors. The original word vectors may be one-hot vectors, and when mapping each original word vector to a second word vector, each original word vector may be used as an input of a word-to-vector (word 2 vec) model, and the second word vector with a low dimension is obtained through word2 vec.

In step S203, decoding the first semantic vector sequence and the second semantic vector sequence through a decoding model to obtain a question text corresponding to the current question-answer number.

When the first semantic vector sequence is generated, the text position corresponding to the current question-answering number is used as one item to be input, and the second semantic vector sequence is obtained by encoding a second word vector corresponding to the historical question-answering content, so that the decoding model decodes the first semantic vector sequence and the second semantic vector sequence word by word, the obtained questions are linked with the historical question-answering content, the smoothness of the questions and the questions of the text content corresponding to the current number can be ensured, when the value of the current number of questions is changed, the questions related to the text content of the new number of questions and answers can be generated, and the topics can be gradually transferred in a multi-round dialogue, so that the questions in the multi-round questions and answers can be better generated.

In step S204, the question text is output.

Here, if step S204 is implemented by the second terminal in fig. 1B or the server in fig. 1C, the outputting of the question text may be sending the generated question text to the first terminal shown in fig. 1B or to the third terminal shown in fig. 1C, so that the first terminal or the third terminal may display the question text in text form through the display device, or may, of course, play the voice information corresponding to the question text in voice form by the first terminal or the third terminal.

If step S204 is implemented by the third terminal in fig. 1D, the outputting of the question text may be that the third terminal displays the question text in text form through the display device, or may, of course, also be that the third terminal plays the voice information corresponding to the question text in voice form.

In some embodiments, before step S201, it is necessary to first acquire a first word vector and an answer information vector corresponding to the reference text. Referring to fig. 4, fig. 4 is a schematic flowchart of an implementation of obtaining a word vector of a reference text according to an embodiment of the present application, as shown in fig. 4, including the following steps:

step S111, original word vectors and attribute information corresponding to each word in the reference text are obtained.

When the first word vector and the answer information vector corresponding to the reference text are acquired, firstly, an original word vector automatically generated by each word in the reference text is acquired, wherein the original word vector can be a one-hot vector, the dimension is generally high, and therefore, after the original word vector of each word is acquired, each original word vector is mapped to a low-dimensional continuous vector word space

The attribute information characterizes whether the word segmentation is an answer.

Step S112, judging whether the attribute information of the word segmentation indicates that the word segmentation is an answer.

Here, if the attribute information of the word indicates that the word is an answer, the process proceeds to step S204, and if the attribute information of the word indicates that the word is not an answer, the process proceeds to step S203.

Step S113, when the attribute information of the word segmentation indicates that the word segmentation is not an answer, mapping an original word vector corresponding to the word segmentation into a first word vector.

For example, for the sentence "Incumbent democratic president Bill Clinton was ineligible to serve a third term due to term limitations in the 22nd amendment of the constitution", where the attribute of "remote", "term limitations in the 22nd a mendment of the constitution" is an answer, the attribute of the remaining words is a non-answer. The first word vector includes at least a low-dimensional word vector mapped from the original word vectors of the words Incumbent, president, bill Clinton, was, ineligible, to, serve, a, third, term, due, to.

Step S114, when the attribute information of the word segmentation indicates that the word segmentation is an answer, mapping an original word vector corresponding to the word segmentation into an answer information vector.

By way of example, the answer information vector includes at least democratic, term, limitations, in, the, 22 ^nd The low-dimensional word vector mapped to the original word vector of these words amendment, of, the, constitution may also include, in some embodiments, a location information component of the answer in the answer information vector.

In some embodiments, the above step S201 may be implemented by performing, through a first coding model, coding processing on a first word vector, an answer information vector, and a text position vector corresponding to a current question-answer number corresponding to the acquired reference text, to obtain a first semantic vector sequence:

And step 2011, splicing the first word vector, the answer information vector and the text position vector to obtain a spliced vector.

Here, when the first word vector, the answer information vector and the text position vector are spliced, the first word vector, the answer information vector and the text position vector may be spliced according to a preset sequence, for example, the preset sequence may be that the first word vector is the text position vector, the second word vector is the answer information vector, and the third word vector is the first word vector, then the text position vector, the answer information vector and the first word vector are spliced in sequence to obtain the spliced vector.

Step S2022 converts the spliced vector into an intermediate vector having a fixed length by the first coding model.

Here, the intermediate Vector may be a Context Vector (Context Vector), which is a weighted Context Vector of each word, by which a Context relationship can exist between one word and other words.

Step S2023 encodes the sequence information of the spliced vector in the intermediate vector, to obtain a first semantic vector sequence.

Since the lengths of the first word vector, the answer information vector and the text position vector are generally different for different reference texts, the lengths of the obtained spliced vectors are different, and the first coding model codes the spliced vectors into a first semantic vector sequence with a fixed length when the spliced vectors are coded.

In steps S2021 to S2022, the first word vector, the answer information vector and the text position vector are spliced, and then the spliced vector is encoded, so that the first semantic vector sequence is ensured to contain information of three aspects, and the encoding efficiency is improved.

In some embodiments, as shown in fig. 3B, following step S204, the following procedure may also be performed:

step S205, obtaining answer texts corresponding to the question texts.

Here, the answer text may be input by the user through the input device, or may be uttered by the user through voice. Therefore, when step S205 is implemented, the answer text input by the user through the input device may be obtained, or the answer voice information of the user may be obtained, and then the answer voice information is subjected to voice recognition to obtain the answer text.

Step S206, judging whether the answer text and the preset answer text meet the matching relation.

If the answer text and the preset answer text meet the matching relation, the answer text is indicated to be matched with the answer text, that is, the answer is indicated to be correct, at this time, the question can be continuously generated according to the subsequent article part, and the step S207 is entered; if the answer text and the preset answer text do not meet the matching relationship, the answer text is not matched with the answer text, that is, the answer is wrong, and the step S208 is performed.

Step S207, updating the value of the current question-answer round number to the value of the next question-answer round number.

For example, assuming that the value of the current question-answer number is 2, if the answer text and the answer text satisfy the matching relationship, the value of the current question-answer number is updated to 3 at this time, and then step S201 is further performed until the current question-answer number reaches the preset end number.

Step S208, maintaining the value of the current question-answering round number.

For example, assuming that the value of the current question-answer number is 2, if the answer text and the answer text do not satisfy the matching relationship, the value of the current question-answer number remains unchanged and is still 2, and then step S201 is performed.

In some embodiments, when the answer text and the answer text do not meet the matching relationship, adding 1 to the number of wrong answers corresponding to the question, and when the number of answers exceeds a specified number threshold, updating the value of the current question-answering round number to the value of the next question-answering round number. For example, the number of times threshold is 3, and when the user answers a question more than 3 times, then the generation of questions from the subsequent article section is continued.

Note that the initial value of the number of errors to be answered for each question is 0.

Through steps S205 to S208, whether to perform subsequent question generation can be determined according to whether the answer text of the user is correct, so that the knowledge points which are not well mastered are further consolidated in the question-answering process while the consistency of the dialogue is ensured.

In some embodiments, the above step S203 may be implemented in such a way that:

and carrying out word-by-word decoding on the first semantic vector sequence and the second semantic vector sequence through a decoding model, concentrating the attention distribution of the chapter position of the decoding model in the text corresponding to the current round number in the decoding process, and concentrating the attention distribution of the decoding model in entity nouns in the second semantic vector sequence when the pronouns are decoded and output so as to generate the aligned problem text corresponding to the current round number.

Here, step S103 is implemented by performing word-by-word decoding on the first semantic vector sequence and the second semantic vector sequence using the already trained decoding model. When the decoding model is trained, parameters of the decoding model are adjusted according to the loss function capable of realizing the alignment of the references and the loss function capable of realizing the concentration of the attention distribution at the text position corresponding to the current question-answering number, so that the generated problems are subjected to word-by-word decoding on the first semantic vector sequence and the second semantic vector sequence through the decoding model, the alignment of the references can be realized, and the generated problems are related to the text content corresponding to the current question-answering number, and the conversations of questions and answers are improved.

In some embodiments, the method further comprises:

and step 31, acquiring a third semantic vector sequence corresponding to the initial decoding model and the training text.

Here, the training text may be a question-answer dialogue text, and the third semantic vector sequence corresponding to the training text further includes part-of-speech information for characterizing each word, so as to determine whether a pronoun needs to be generated in the decoding process.

And step 32, when the third semantic vector sequence is decoded through the decoding model and the generation of the pronouns is determined, adjusting parameters of the decoding model according to a first optimization objective function so as to concentrate the attention distribution of the decoding model to entity nouns.

Here, when the decoding model decodes the third semantic vector sequence and determines that a pronoun needs to be generated, parameters of the decoding model may be adjusted through the first optimization objective function, so that attention distribution of the decoding model is concentrated to entity nouns corresponding to the pronouns, thereby ensuring that the pronouns in the generated problem correspond to the entity nouns, and further improving conversations of questions and answers.

In some embodiments, the method further comprises:

step 41, performing joint training on the decoding model at least according to a first optimization objective function and a second optimization objective function so as to adjust parameters of the decoding model;

In the actual implementation process, besides performing joint training on the decoding model according to the first optimization objective function and the second optimization objective function, the decoding model may be further subjected to joint training according to the formula (1-1) so as to adjust parameters of the decoding model:

L＝L _nll +L _coref +L _flow (1-1)；

wherein L is _nll Is the optimization target of classical encoder-decoder model, L _coref For the first optimization objective function, L _flow And optimizing the objective function for the second.

In the following, an exemplary application of the embodiment of the present application in a practical application scenario will be described.

Fig. 5 is a schematic diagram of a system architecture of a problem generating method according to an embodiment of the present application, as shown in fig. 5, the system architecture at least includes a chapter encoder (passage encoder) 501, a section encoder (conversation encoder) 502, and a decoder 503. Wherein the chapter encoder 501 encodes english text and the paragraph encoder 502 encodes historical dialog content to convert discrete text into continuous vectorized semantic representations, respectively.

The chapter encoder 501 encodes the english short text, and in fact the chapter encoder 501 encodes a series of correlation vectors for the english short text. Prior to encoding, it is first necessary to convert english shorthand from a discrete source word representation to a continuous word space embedded representation (word embedding) and simultaneously convert answer location information to a corresponding answer location representation (answer position embedding). In addition, a dialog flow representation (turn number embedding & chunk ebedding) is designed separately in the embodiment of the present application. The word space embedded representation, answer location representation, and dialog flow representation are then encoded by the chapter encoder 501 to obtain a vectorized semantic representation. The vectorized semantic representation encoded by the chapter encoder 501 corresponds to the first semantic vector sequence in other embodiments.

Paragraph encoder 502 encodes the historical dialog content, i.e., the previous questions and answers. In implementation, paragraph encoder 502 encodes the word space embedded representation corresponding to the historical dialog content to obtain a vectorized semantic representation. The vectorized semantic representation encoded by paragraph encoder 502 corresponds to a second semantic vector sequence in other embodiments.

After deriving the vectorized semantic representation, the decoder 503 converts the vectorized semantic representation into questions, using the techniques of reference alignment (co-reference alignment) and conversational flow modeling (conversation flow modeling) during word-by-word decoding, enabling the model to generate conversational questions.

The following describes a dialog flow module, which refers to an alignment technique.

In this embodiment, the reference alignment module, which adopts the reference alignment technique, can convert entity nouns (entities) in the historical dialog content into pronouns in the dialog to be generated, for example:

english short text segment: incumbent democratic president Bill Clinton was ineligible to serve a third term due to term limitations in the 22 and nd amendment of the constitution.

Conversation history: what politic party was Clinton a member of?

Results of the related technical scheme: what was Clinton ineligible to serve?

Refer to the results of the alignment module: what was he ineligible to serve?

As can be seen from a comparison of the results of the related art generating the questions with the results of the alignment generation of the questions, the alignment module is capable of generating the questions rich in conversations so that the questions can be consistent with the historical conversational content of the conversations. However, the related technical scheme can simply copy entity nouns in the dialogue history, and cannot achieve the aim of indicating alignment.

When the reference alignment module is implemented, when the model needs to generate a pronoun (such as he in the example), the attention distribution (attention distribution) of the model is encouraged to concentrate on entity nouns (such as Clinton in the example) of the conversation history, so that the aim of aligning the generated pronoun with the entity nouns in the conversation content of the history is achieved, and the optimization target can be expressed as the formula #2-1) the loss function L _coref The representation is:

wherein,to be the probability of the attention distribution corresponding to the entity noun to be referred to, beta _i-k,j To be the attention distribution of the sentence, p _coref Generating probabilities of the pronouns in the probability distribution, s _c To be the confidence probability of the reference pair, lambda ₁ And lambda (lambda) ₂ For adjusting the hyper-parameter of the optimization objective, the empirical value is 1.

The dialogue flow module can realize the focus transfer in continuous questions, and achieves the purposes that the questions in the first rounds in the dialogue are the contents of the first few sentences of the short messages, and the questions gradually pay attention to the contents behind the short messages until the dialogue is ended along with the deep dialogue.

In the dialog flow module, an embedded representation method (flow embedding) of dialog flow is first designed, that is, a dialog flow representation input into the chapter encoder 501, and the dialog flow representation includes two parts: a current wheel number representation (turn number embedding) and a chapter relative position representation (chunk ebedding), wherein:

The current number of rounds represents (turn number embedding) the number of rounds the current session is in is mapped to a continuous vectorized representation that is added to the chapter encoder so that the model can learn the number of rounds the current session is in and focus on the appropriate location in the text.

In the present embodiment, it is assumed that the chapter (passage) is divided into 10 parts uniformly according to sentences, so that the model learns the relative relationship between the number of rounds and the chapter position. Intuitively, the initial number of dialog turns, the model should pay attention to the sentences of the front part of the chapter, generate corresponding questions for some of the front sentences, and when the dialog is deep, the model should pay attention to the sentences of the back part of the chapter and generate corresponding questions for the back sentences. Thus, the model can generate the questions in the sequence of chapter descriptions, and the model is more in accordance with the rhythm of the questions in the dialogue.

Second, the dialog flow module optimizes the distribution of the chapter attentiveness (passage attention) of each round of dialog, and limits the attentiveness distribution a of the model by labeling the sentences in the chapter middle _j So that the model only focuses on the relevant content in the short text. The optimization objective may be represented by a loss function L as described by equation (2-2) _flow The representation is:

wherein lambda is ₃ And lambda (lambda) ₄ For adjusting the super parameter of the optimization target, the empirical value is 1; alpha _j The probability of attention distribution for chapters; CES means that the sentence belongs to a sentence that the current question needs to ask, HES means that the sentence belongs to a sentence that has been previously asked.

In the embodiment of the application, the alignment module and the dialogue flow modeling module are jointly trained, and the optimization target of the joint training can be expressed by a formula (2-3):

L＝L _nll +L _coref +L _flow (2-3)；

wherein L is _nll Is the optimization objective of classical encoder-decoder models.

The method provided by the embodiment of the application can be applied to an intelligent teacher system (intelligence tutor system). The application scenario is for a short text in english, and the system asks the student a series of questions in the form of a dialogue to test the student's understanding of the short text. Compared with the previous model, which mainly focuses on respectively raising problems according to a sentence, the system provided by the application can be interacted by a dialogue mode.

The results of the evaluation comparison with the prior art solutions are shown in table 1:

TABLE 1

	BLEU1	BLEU2	BLEU3	ROUGE_L
					Existing solutions	28.84	13.74	8.16	39.18
The proposal is that	37.38	22.81	16.25	46.90

As can be seen from table 1, the method provided by the embodiment of the present application is superior to the existing scheme based on several evaluation modes of BLEU1, BLEU2, BLEU3 and rouge_l.

The embodiment of the application can be applied to the continuous questioning of the intelligent teacher system in the dialogue so as to test the understanding scene of students on a section of English short text, and model the dialogue history content through the multi-input encoder and the index alignment model, so that the generated problems are coherent with the dialogue history, the entity names (entities) in the dialogue history can be converted into corresponding pronouns in the generated problems, and the dialogue of the problems is improved; in addition, through the dialogue flow module provided by the embodiment of the application, the theme can be gradually transferred in a multi-round dialogue, so that the problems in multiple rounds of questions and answers can be better generated.

It should be noted that the problem generating method provided by the embodiment of the present application is not limited to a specific network model, and may be used not only in an encoder-decoder model, but also in a model such as a transformer (transducer); meanwhile, the problem generating method provided by the embodiment of the application is not limited to being applied to an intelligent teacher system, the index alignment method can be used for all models which need to convert an entity in input into a pronoun, and the conversation flow module can be used in general conversation scenes, such as a chat robot, a customer service robot and the like, and at the moment, the chat conversation history is used as the basis for generating the input of a chapter encoder.

An exemplary architecture of the software modules is described below, and in some embodiments, as shown in FIG. 2, the software modules 80 in the apparatus 440 may include:

the first encoding module 81 is configured to encode, by using a first encoding model, a first word vector, an answer information vector, and a text position vector corresponding to a current question-answer number corresponding to a reference text, so as to obtain a first semantic vector sequence;

the second encoding module 82 is configured to encode a second word vector corresponding to the historical question-answering text through a second encoding model to obtain a second semantic vector sequence;

the decoding module 83 is configured to decode the first semantic vector sequence and the second semantic vector sequence through a decoding model to obtain a question text corresponding to the current question-answer number;

an output module 84 for outputting the question text.

In some embodiments, the apparatus further comprises:

the first acquisition module is used for acquiring original word vectors and attribute information corresponding to each word in the reference text, wherein the attribute information represents whether the word is an answer or not;

the first mapping module is used for mapping an original word vector corresponding to the word segmentation into a first word vector when the attribute information of the word segmentation indicates that the word segmentation is not an answer;

And the second mapping module is used for mapping the original word vector corresponding to the word segmentation into an answer information vector when the attribute information of the word segmentation indicates that the word segmentation is an answer.

In other embodiments, the apparatus further comprises:

the splicing module is used for splicing the first word vector, the answer information vector and the text position vector to obtain a spliced vector;

the first encoding module 81 is further configured to convert the spliced vector into an intermediate vector with a fixed length through the first encoding model, and encode sequence information of the spliced vector in the intermediate vector to obtain a first semantic vector sequence.

In other embodiments, the apparatus further comprises:

the second acquisition module is used for acquiring answer texts corresponding to the question texts;

the round number updating module is used for updating the value of the current round number of questions and answers to the round number when the answer text and the preset answer text meet the matching relation;

and the round number maintaining module is used for maintaining the value of the current question and answer round number when the answer text and the answer text do not meet the matching relation.

In other embodiments, the decoding module 83 is further configured to decode the first semantic vector sequence and the second semantic vector sequence word by word through a decoding model;

And concentrating the attention distribution of the chapter position of the decoding model in the text corresponding to the current round number, and concentrating the attention distribution of the decoding model in the entity nouns in a second semantic vector sequence when the pronouns need to be generated, so that the aligned problem text corresponding to the current round number is generated.

In other embodiments, the apparatus further comprises:

the third acquisition module is used for acquiring a third semantic vector sequence corresponding to the initial decoding model and the training text;

the decoding module 83 is further configured to, when the third semantic vector sequence is decoded by the decoding model and it is determined that a pronoun needs to be generated, adjust parameters of the decoding model according to a first optimization objective function, so that attention distribution of the decoding model is focused on entity nouns.

In other embodiments, the apparatus further comprises:

As an example of a hardware implementation of the method provided by the embodiment of the present application, the method provided by the embodiment of the present application may be directly implemented by the processor 410 in the form of a hardware decoding processor, for example, by one or more application specific integrated circuits (ASIC, application Specific Integrated Circuit), DSPs, programmable logic devices (PLDs, programmable Logic Device), complex programmable logic devices (CPLDs, complex Programmable Logic Device), field programmable gate arrays (FPGAs, fields-Programmable Gate Array), or other electronic components.

Embodiments of the present application provide a storage medium having stored therein executable instructions which, when executed by a processor, cause the processor to perform a method provided by embodiments of the present application, for example, as shown in fig. 3 and 4.

In some embodiments, the storage medium may be FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of devices including one or any combination of the above memories.

In some embodiments, the executable instructions may be in the form of programs, software modules, scripts, or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.

As an example, the executable instructions may, but need not, correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, in one or more scripts in a hypertext markup language (HTML, hyper Text Markup Language) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

As an example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices located at one site or, alternatively, distributed across multiple sites and interconnected by a communication network.

The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A problem generating method, the method comprising:

the method comprises the steps of encoding a first word vector, an answer information vector and a text position vector corresponding to a current question-answer round number corresponding to a reference text through a first encoding model to obtain a first semantic vector sequence, wherein the first word vector is a vector representation of all words in a continuous word space in the reference text or a vector representation of words which are not answers in the reference text in a continuous word space, the answer information vector is a vector representation of words marked as answers in the reference text in the continuous word space, and the text position vector corresponding to the current question-answer round number is generated according to text position information corresponding to the current question-answer round number;

performing word-by-word decoding on the first semantic vector sequence and the second semantic vector sequence through a decoding model; concentrating the chapter position attention distribution of the decoding model in the text corresponding to the current question-answer wheel number in the word-by-word decoding process, and concentrating the attention distribution of the decoding model in entity nouns in a second semantic vector sequence when a pronoun is decoded and output so as to generate a question text corresponding to the current question-answer wheel number and referring to alignment;

and outputting the problem text.

2. The method as recited in claim 1, wherein the method further comprises:

acquiring original word vectors and attribute information corresponding to each word in a reference text, wherein the attribute information characterizes whether the word is an answer or not;

when the attribute information of the word segmentation indicates that the word segmentation is not an answer, mapping an original word vector corresponding to the word segmentation into a first word vector;

when the attribute information of the word segmentation indicates that the word segmentation is an answer, mapping an original word vector corresponding to the word segmentation into an answer information vector.

3. The method of claim 1, wherein the encoding, by the first encoding model, the first word vector, the answer information vector, and the text position vector corresponding to the current number of question-answering rounds corresponding to the obtained reference text to obtain the first semantic vector sequence includes:

splicing the first word vector, the answer information vector and the text position vector to obtain a spliced vector;

converting the spliced vector into an intermediate vector with a fixed length through the first coding model;

and encoding the sequence information of the spliced vector in the intermediate vector to obtain a first semantic vector sequence.

4. The method of claim 1, wherein after outputting the question text, the method further comprises:

obtaining answer texts corresponding to the question texts;

when the answer text and the preset answer text meet the matching relation, updating the value of the current question-answer round number to the value of the next question-answer round number;

and when the answer text and the answer text do not meet the matching relation, maintaining the value of the current question-answer round number.

5. The method according to any one of claims 1 to 4, further comprising:

Acquiring a third semantic vector sequence corresponding to the initial decoding model and the training text;

and when the third semantic vector sequence is decoded through the decoding model and the pronouns are determined to be generated, adjusting parameters of the decoding model according to a first optimization objective function so as to concentrate the attention distribution of the decoding model to entity nouns.

6. The method according to any one of claims 1 to 4, further comprising:

performing joint training on the decoding model at least according to a first optimization objective function and a second optimization objective function so as to adjust parameters of the decoding model;

7. A problem generating apparatus, comprising:

the first coding module is used for coding a first word vector corresponding to a reference text, an answer information vector and a text position vector corresponding to the current question-answer number through a first coding model to obtain a first semantic vector sequence, wherein the first word vector is a vector representation of all words in a continuous word space in the reference text or a vector representation of words which are not answers in the reference text in the continuous word space, the answer information vector is a vector representation of words marked as answers in the reference text in the continuous word space, and the text position vector corresponding to the current question-answer number is generated according to the text position information corresponding to the current question-answer number;

the decoding module is used for decoding the first semantic vector sequence and the second semantic vector sequence word by word through a decoding model; concentrating the chapter position attention distribution of the decoding model in the text corresponding to the current question-answer wheel number in the word-by-word decoding process, and concentrating the attention distribution of the decoding model in entity nouns in a second semantic vector sequence when a pronoun is decoded and output so as to generate a question text corresponding to the current question-answer wheel number and referring to alignment;

and the output module is used for outputting the problem text.

8. The apparatus as recited in claim 7, wherein the apparatus further comprises:

9. The apparatus as recited in claim 7, wherein the apparatus further comprises:

the first coding module is further configured to convert the spliced vector into an intermediate vector with a fixed length through the first coding model, and code sequence information of the spliced vector in the intermediate vector to obtain a first semantic vector sequence.

10. The apparatus as recited in claim 7, wherein the apparatus further comprises:

11. The apparatus according to any one of claims 7 to 10, further comprising:

and the decoding module is also used for decoding the third semantic vector sequence through the decoding model, and adjusting parameters of the decoding model according to the first optimization objective function when the generation of the pronouns is determined, so that the attention distribution of the decoding model is concentrated to entity nouns.

12. A problem generating apparatus, characterized by comprising:

a memory for storing executable instructions;

a processor for implementing the method of any one of claims 1 to 6 when executing executable instructions stored in said memory.

13. A storage medium having stored thereon executable instructions for causing a processor to perform the method of any one of claims 1 to 6.