CN112988967A

CN112988967A - Dialog generation method and device based on two-stage decoding, medium and computing equipment

Info

Publication number: CN112988967A
Application number: CN202110248798.1A
Authority: CN
Inventors: 蔡毅; 钟志成; 孔俊生
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2021-03-08
Filing date: 2021-03-08
Publication date: 2021-06-18

Abstract

The invention discloses a dialog generation method and device based on two-stage decoding, a medium and a computing device, wherein the method divides the reply generation process of a dialog into two decoding stages, and firstly inputs a dialog context into a dialog generation model to map the dialog context into word embedded vectors; then, inputting the word vector into a context self-attention encoder to obtain a feature vector of the conversation context, and then inputting the feature vector into a first-stage transform decoder to decode to generate a real word sequence; inputting the real word sequence into a real word sequence encoder to obtain a characteristic vector of the real word sequence; and finally, inputting the context and the feature vector of the real word sequence into a second-stage transform decoder together, and decoding to generate a final reply. The invention prevents the interference of the dummy words with higher frequency but lacking semantic information to the real words through the two-stage decoding process, thereby improving the relevance and the information quantity of the reply.

Description

Dialog generation method and device based on two-stage decoding, medium and computing equipment

Technical Field

The invention relates to the technical field of natural language processing, in particular to a dialog generation method and device based on two-stage decoding, a medium and computing equipment.

Background

In recent years, with the development of deep learning technology and the appearance of a large number of dialogue data sets, an open-field-oriented dialogue system can be constructed by using the deep learning technology, and the application scenes of the dialogue system are greatly expanded.

In the field of open-domain dialog generation, the current mainstream practice is based on an end-to-end generation framework: an encoder is used to encode the dialog context into a feature vector, and a decoder is used to decode the dialog reply based on the previously generated vector. However, the underlying end-to-end dialog generation model may tend to generate generic, information-starved replies. Such as "good", "unknown", etc. In recent years, there have been a number of solutions proposed to this problem, such as: personalized information, topic information or external knowledge information is added to enable the model to better understand the semantics of the context and the like, however, the methods still only use a single decoder to generate the whole dialogue reply at one time without distinguishing real words and virtual words. Thus, the model tends to generate null words with less semantic information but a higher frequency of occurrence, rather than real words with more semantic information but a lower frequency of occurrence, still causing the model to generate a generic, informative response.

Disclosure of Invention

A first object of the present invention is to overcome the disadvantages of the prior art and to provide a dialog generation method based on two-stage decoding. According to the method, the real word sequence required in the reply is generated in the decoding of the first stage, and the final dialogue reply is generated in the decoding of the second stage based on the generated real word sequence and the context information, so that the influence of high-frequency imaginary words on the generation of low-frequency real words is effectively avoided, and the relevance and the information content of the reply are improved on the premise of ensuring the fluency of the reply.

A second object of the present invention is to provide a dialog generating device based on two-stage decoding.

A third object of the present invention is to provide a computer-readable storage medium.

It is a fourth object of the invention to provide a computing device.

The first purpose of the invention can be realized by the following technical scheme:

a dialog generation method based on two-stage decoding, generating a dialog by a dialog generation model of the two-stage decoding, the model including two self-attention encoders and two transform decoders, the method comprising the steps of:

(1) inputting a text of a conversation context in a model, and mapping each word in the text into a word embedding vector;

(2) the method comprises the steps that a word embedding vector is input into a context self-attention encoder by taking a sentence as a unit, and a feature vector of a context is extracted through the context self-attention encoder;

(3) inputting the obtained context feature vector into a first-stage transform decoder, and decoding to generate a real word sequence, wherein the real word sequence expresses main semantic information in final reply;

(4) inputting the obtained real word sequence into a real word sequence self-attention encoder to obtain a characteristic vector of the real word sequence;

(5) and inputting the context obtained by coding and the feature vector of the real word sequence into a second-stage transform decoder, and decoding to generate a final reply.

Preferably, in step (1), the word embedding vector mapped by the ith word in the text is represented as:

wherein i represents the position of the word in the text; x is the number of_iRepresenting the ith word in the text;

representing the word x_iThe mapped initial word embedding vector;

pe (i) represents a position sine and cosine coding function, and codes the position i into a vector, so that the model can process the sequence information of the text, and the specific calculation method is: let d be the dimension of the final vector and the vector position be evenThe value of the number is PE (i, 2z) ═ sin (i/1000)^2z/d) The odd vector position is PE (i, 2z +1) cos (i/1000)^2z/d) Wherein z is equal to [ 0.,. d/2 ]]And finally obtaining the position code with the dimension d.

Preferably, in step (2), the extracted context feature vector is:

PSAE(U)＝FFN(G)

G＝MultiHead(In_s,In_s,In_s)

In_s＝[s₁,s₂,...,s_I]

wherein, In_s＝[s₁,s₂,...,s_I]Representing a matrix formed by all word embedding vectors in the input text, wherein the length of the matrix is I; multihead (In)_s,In_s,In_s) Represents In_sInputting the Q, K and V into a multi-head self-attention function, wherein G is the final output characteristic of the multi-head self-attention function; FFN (G) indicates that the multi-headed feature G obtained from the attention function is input into the full-link layer FFN to obtain a final context feature vector: psae (U) ═ ffn (g), where U ═ x₁,...,x_L]Representing the entered dialog context, L being the total number of words of the dialog context.

Preferably, in step (3), the ith real word c is generated_iThe real word sequence c that has been generated_≤i-1The vector matrix composed of all word embedding vectors in (a) is expressed as:

in the decoding process of the first stage transform decoder, two multi-headed autonomous attention layers are required, which are:

wherein the subscript 1 represents the decoding process of the first stage; psae (u) represents the extracted context feature vector;

to obtain the ith real word c_iAnd also needs to order

After processing of the full connection layer and the softmax layer, the formula is as follows:

P(c_i)＝softmax(F₁ ⁽ⁱ⁾)

finally, according to the probability distribution P (c)_i) Generating the ith real word c_i；

By analogy, the first stage transform decoder finally decodes to generate the whole real word sequence C ═ C₁,...,c_K]And K is the total number of real words in the real word sequence.

Preferably, in step (4), the feature vector of the real word sequence extracted from the attention encoder by the real word sequence is:

CwSAE(C)＝FFN(G)

G＝MultiHead(In_c,In_c,In_c)

In_c＝[c₁,c₂,...,c_K]

wherein, In_c＝[c₁,c₂,...,c_K]Representing a matrix consisting of all word-embedded vectors in a real word sequence; multihead (In)_c,In_c,In_c) And FFN (G) respectively represents a multi-head self-attention function and a full-connection function to obtain a final real word sequence feature vector: CwSAE (C).

Preferably, in step (5), the ith reply word r is generated_iThen, the reply sequence r has been generated_≤i-1The vector of (d) is represented as:

in the decoding process of the second stage transform decoder, two multi-headed autonomous attention layers are required to be passed, which are:

wherein the subscript 2 represents the second stage decoding process; CwSAE (U) represents the extracted feature vector of the real word sequence;

to obtain the ith reply word r_iAnd also needs to order

finally, according to the probability distribution P (r)_i) Generating the ith reply word r_i；

And by analogy, the second stage Transformer decoder finally decodes to generate the whole recovery sequence R ═ R₁,...,r_J]And J is the total number of reply words in the reply sequence.

Preferably, training the model by using the context sentences of the existing conversation, the real word sequences and the conversation returns obtained in the steps (1) to (5) to obtain an optimal model;

in the training process, the loss functions of two decoding stages are respectively calculated:

a joint loss function is obtained:

L_total＝L₁+L₂

and then based on the joint loss function L_totalAnd updating the model parameters by using a direction propagation mode to obtain a final optimal dialogue generating model.

The second purpose of the invention can be realized by the following technical scheme:

a dialog generating device based on two-stage decoding comprises a mapping module, a context self-attention encoder, a first-stage Transformer decoder, a real word sequence self-attention encoder and a second-stage Transformer decoder which are sequentially connected, wherein the context self-attention encoder is also connected to the second-stage Transformer decoder;

the mapping module is used for taking a text of a conversation context as input and mapping each word in the text into a word embedding vector;

the context self-attention encoder is used for taking a sentence as a unit, embedding words into a vector as input, and extracting a feature vector of a context;

the first stage Transformer decoder is used for decoding the context feature vector as an input to generate a real word sequence, and the real word sequence expresses main semantic information in final reply;

the real word sequence self-attention encoder is used for inputting the real word sequence and encoding to obtain a characteristic vector of the real word sequence;

and the second stage Transformer decoder is used for decoding the feature vector of the context and the feature vector of the real word sequence to generate a final reply by taking the feature vectors of the context and the feature vectors of the real word sequence as input.

The third object of the invention can be achieved by the following technical solutions: a computer-readable storage medium storing a program which, when executed by a processor, implements a dialog generation method based on two-stage decoding according to the first object of the present invention.

The fourth purpose of the invention can be realized by the following technical scheme: a computing device comprising a processor and a memory for storing a processor-executable program, wherein the processor, when executing the program stored in the memory, implements the dialog generating method based on two-stage decoding according to the first object of the present invention.

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) the invention decouples the common single decoding process into two stages, and generates the real word sequence independently to avoid the influence of the dummy words with higher frequency but lacking semantic information on the real words, thereby improving the relevance and the information quantity of the dialog generation reply and reducing the occurrence of generating the general reply lacking the information quantity.

(2) The dialogue generation model is divided into a context self-attention encoder, a real word sequence self-attention encoder and a two-stage decoder, so that the model has better interpretability and is more convenient to realize the controllability of the model, and the reply generation can be adjusted by adjusting the input of the model.

Drawings

Fig. 1 is a flowchart of a dialog generation method based on two-stage decoding according to the present invention.

FIG. 2 is a schematic diagram of a dialog generation model.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

Example 1

The embodiment discloses a dialog generating method based on two-stage decoding, and the method generates a dialog through a two-stage decoding dialog generating model, and can be applied to systems of various language responses in the field of human-computer interaction, such as chatting intelligent robots, voice-controlled intelligent home products and the like.

The structure of the dialog generation model is shown in fig. 2, and mainly includes two Self-Attention encoders (Self-Attention Encoder, SAE): a context Self-Attention Encoder (PSAE) and a real-word sequence Self-Attention Encoder (CwSAE) for feature extraction of a context sentence U and a real-word sequence C, respectively, and two transform decoders: a first-stage transform decoder and a second-stage transform decoder. The context self-attention encoder is connected with the first-stage Transformer decoder and the second-stage Transformer decoder, and the first-stage Transformer decoder, the real word sequence self-attention encoder and the second-stage Transformer decoder are sequentially connected.

As shown in fig. 1, the dialog generation method includes the steps of:

(1) text of a dialog context (the text contains several sentences) is input in the model, and each word in the text is mapped to a word embedding vector. The words include real words (such as verbs and nouns) and imaginary words (such as mood-assistant words).

The word embedding vector mapped by the ith word in the text is represented as:

where + represents the addition of the vectors; x is the number of_iRepresenting the ith word in the text;

representing the word x_iThe mapped initial word embedding vector;

pe (i) represents a position sine and cosine coding function, which can code position i into a vector, so that the model can process the sequence information of the text, and the specific calculation method is: d is the dimension of the final vector, where the vector position is even and the value is PE (i, 2z) ═ sin (i/1000)^2z/d) The odd vector position is PE (i, 2z +1) cos (i/1000)^2z/d) Whereinz∈[0,...,d/2]And finally obtaining the position code with the dimension d.

(2) The word embedding vector is input into a context self-attention encoder by taking a sentence as a unit, and a feature vector of the context is extracted from the context self-attention encoder through the context.

The extracted context feature vector is:

PSAE(U)＝FFN(G)

G＝MultiHead(In_s,In_s,In_s)

In_s＝[s₁,s₂,...,s_I]

wherein, In_s＝[s₁,s₂,...,s_I]Representing a matrix formed by all word embedding vectors in the input text, wherein the length of the matrix is I; multihead (In)_s,In_s,In_s) Represents In_sInputting the Q, K and V into a multi-head self-attention function, wherein G is the final output characteristic of the multi-head self-attention function; FFN (G) indicates that the multi-headed features G obtained from the attention function are input into the full-connected layer FFN to obtain the final context sentence feature vector: psae (U) ═ ffn (g), where U ═ x₁,...,x_L]Representing the entered dialog context, L being the total number of words of the dialog context.

(3) And inputting the extracted context feature vector into a first-stage transform decoder, and decoding to generate a real word sequence which expresses main semantic information in the final reply.

Wherein, the ith real word c is generated_iThe real word sequence c that has been generated_≤i-1The vector matrix of (a) is represented as:

in the formula, subscript 1 represents the decoding process of the first stage; psae (u) represents the extracted context feature vector.

To obtain the ith real word c_iAnd also needs to be

After the processing of the full connection layer and the softmax layer, the formula is as follows:

P(c_i)＝softmax(F₁ ⁽ⁱ⁾)

finally, according to the probability distribution P (c)_i) Generating the ith real word c_i。

By analogy, the whole real word sequence C ═ C can be generated by decoding₁,...,c_K]And K is the total number of real words in the real word sequence.

(4) Inputting the real word sequence (C) obtained by decoding in the first stage into a real word sequence self-attention encoder to obtain a feature vector of the real word sequence, wherein the feature vector is expressed as:

CwSAE(C)＝FFN(G)

G＝MultiHead(In_c,In_c,In_c)

In_c＝[c₁,c₂,...,c_K]

in the formula (II)_c＝[c₁,c₂,...,c_K]Representing a matrix consisting of all word-embedded vectors in a real word sequence; multihead (In)_c,In_c,In_c) And FFN (G) represents the multi-headed self-attention function and the full-concatenation function, In_cAs input to the multi-headed self-attention function, the multi-headed self-attention function ends upThe output feature G is input into a full-connection function FFN, and finally a real word sequence feature vector is obtained: CwSAE (C).

(5) And inputting the feature vector of the context obtained by coding and the feature vector of the real word sequence into a second-stage Transformer decoder, and decoding to generate a final reply.

Wherein, the ith reply word r is generated_iThen, the reply sequence r has been generated_≤i-1The vector of (d) is represented as:

similar to the first-stage decoder, in the decoding process of the second-stage transform decoder, two multi-headed autonomous attention layers are required, respectively:

in the formula, subscript 2 represents the decoding process of the second stage; CwSAE (U) represents the extracted feature vector of the real word sequence.

To obtain the ith reply word r_iAnd also needs to be

finally, according to the probability distribution P (r)_i) Generating the ith reply word r_i。

By analogy, the entire reply sequence R ═ R can be decoded₁,...,r_J]And J is the total number of reply words in the reply sequence.

In this embodiment, two encoders and two decoders may be trained in advance, and joint training is performed using the context sentences of the existing dialog, the real word sequence obtained through steps (1) to (5), and the dialog reply, so as to obtain the optimal model.

a joint loss function is obtained:

L_total＝L₁+L₂

Example 2

This embodiment discloses a dialog generating device based on two-stage decoding, which can implement the dialog generating method in embodiment 1. The dialogue generating device comprises a mapping module, a context self-attention encoder, a first-stage Transformer decoder, a real word sequence self-attention encoder and a second-stage Transformer decoder which are connected in sequence, wherein the context self-attention encoder is also connected to the second-stage Transformer decoder.

The mapping module takes the text of the conversation context as input and can be used for mapping each word in the text into a word embedding vector;

for a context self-attention encoder, a word embedding vector is used as input by taking a sentence as a unit, and a feature vector of a context can be extracted;

for the first stage transform decoder, it takes the context feature vector as input, and can be used to decode and generate a real word sequence, which expresses the main semantic information in the final reply;

the real word sequence self-attention encoder takes the real word sequence as input and can be used for encoding to obtain a characteristic vector of the real word sequence;

for the second stage transform decoder, the context's feature vector and the real word sequence's feature vector are used as its inputs for decoding to generate the final reply.

It should be noted that, the apparatus of this embodiment is only exemplified by the division of the above functional modules, and in practical applications, the above functions may be distributed by different functional modules as needed, that is, the internal structure may be divided into different functional modules to complete all or part of the above described functions.

Example 3

The present embodiment discloses a computer-readable storage medium, which stores a program, and when the program is executed by a processor, the method for generating a dialog based on two-stage decoding according to embodiment 1 is implemented, specifically:

The computer-readable storage medium in this embodiment may be a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a Random Access Memory (RAM), a usb disk, a removable hard disk, or other media.

Example 4

The embodiment discloses a computing device, which includes a processor and a memory for storing an executable program of the processor, and when the processor executes the program stored in the memory, the dialog generation method based on two-stage decoding described in embodiment 1 is implemented, specifically:

The computing device described in this embodiment may be a desktop computer, a notebook computer, a smart phone, a PDA handheld terminal, a tablet computer, or other terminal device with a processor function.

The above description is only for the preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution of the present invention and the inventive concept within the scope of the present invention, which is disclosed by the present invention, and the equivalent or change thereof belongs to the protection scope of the present invention.

Claims

1. A dialog generation method based on two-stage decoding, characterized in that a dialog is generated by a dialog generation model of two-stage decoding, the model comprising two self-attention encoders and two transform decoders, the method comprising the steps of:

2. The dialog generation method of claim 1, wherein in step (1), the word embedding vector mapped by the ith word in the text is represented as:

representing the word x_iThe mapped initial word embedding vector;

pe (i) represents a position sine and cosine coding function, and codes the position i into a vector, so that the model can process the sequence information of the text, and the specific calculation method is: let d be the dimension of the final vector, and the value with even vector positions be PE (i, 2z) ═ sin (i/1000)^2z/d) The odd vector position is PE (i, 2z +1) cos (i/1000)^2z/d) Wherein z is equal to [ 0.,. d/2 ]]And finally obtaining the position code with the dimension d.

3. The dialog generation method according to claim 1, wherein in step (2), the context feature vector is extracted as:

PSAE(U)＝FFN(G)

G＝MultiHead(In_s,In_s,In_s)

In_s＝[s₁,s₂,...,s_I]

4. The dialog generation method according to claim 1, wherein in the step (3), the ith real word c is generated_iThe real word sequence c that has been generated_≤iThe vector matrix composed of all word embedding vectors in-1 is represented as:

to obtain the ith real word c_iAnd also needs to order

F₁ ⁽ⁱ⁾＝FFN(H₁ ⁽ⁱ⁾)

P(c_i)＝softmax(F₁ ⁽ⁱ⁾)

5. The dialog generation method according to claim 1, wherein in step (4), the feature vector of the real-word sequence extracted from the attention encoder in the real-word sequence is:

CwSAE(C)＝FFN(G)

G＝MultiHead(In_c,In_c,In_c)

In_c＝[c₁,c₂,...,c_K]

6. The dialog generation method of claim 1, wherein in step (5), the ith reply word r is generated_iThen, the reply sequence r has been generated_≤i-1The vector of (d) is represented as:

to obtain the ith reply word r_iAnd also needs to order

7. The dialog generation method according to claim 1, characterized in that the model is trained using the context sentences of the existing dialog, the real word sequence and the dialog reply obtained in steps (1) to (5) to obtain an optimal model;

a joint loss function is obtained:

L_total＝L₁+L₂

8. A dialog generating device based on two-stage decoding is characterized by comprising a mapping module, a context self-attention encoder, a first-stage Transformer decoder, a real word sequence self-attention encoder and a second-stage Transformer decoder which are sequentially connected, wherein the context self-attention encoder is also connected to the second-stage Transformer decoder;

9. A computer-readable storage medium storing a program, wherein the program, when executed by a processor, implements the dialog generation method based on two-stage decoding according to any one of claims 1 to 7.

10. A computing device comprising a processor and a memory for storing processor-executable programs, wherein the processor, when executing a program stored in the memory, implements the two-stage decoding-based dialog generation method of any of claims 1 to 7.