CN112988967A - Dialog generation method and device based on two-stage decoding, medium and computing equipment - Google Patents

Dialog generation method and device based on two-stage decoding, medium and computing equipment Download PDF

Info

Publication number
CN112988967A
CN112988967A CN202110248798.1A CN202110248798A CN112988967A CN 112988967 A CN112988967 A CN 112988967A CN 202110248798 A CN202110248798 A CN 202110248798A CN 112988967 A CN112988967 A CN 112988967A
Authority
CN
China
Prior art keywords
context
real
word sequence
vector
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110248798.1A
Other languages
Chinese (zh)
Inventor
蔡毅
钟志成
孔俊生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202110248798.1A priority Critical patent/CN112988967A/en
Publication of CN112988967A publication Critical patent/CN112988967A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks

Abstract

The invention discloses a dialog generation method and device based on two-stage decoding, a medium and a computing device, wherein the method divides the reply generation process of a dialog into two decoding stages, and firstly inputs a dialog context into a dialog generation model to map the dialog context into word embedded vectors; then, inputting the word vector into a context self-attention encoder to obtain a feature vector of the conversation context, and then inputting the feature vector into a first-stage transform decoder to decode to generate a real word sequence; inputting the real word sequence into a real word sequence encoder to obtain a characteristic vector of the real word sequence; and finally, inputting the context and the feature vector of the real word sequence into a second-stage transform decoder together, and decoding to generate a final reply. The invention prevents the interference of the dummy words with higher frequency but lacking semantic information to the real words through the two-stage decoding process, thereby improving the relevance and the information quantity of the reply.

Description

Dialog generation method and device based on two-stage decoding, medium and computing equipment
Technical Field
The invention relates to the technical field of natural language processing, in particular to a dialog generation method and device based on two-stage decoding, a medium and computing equipment.
Background
In recent years, with the development of deep learning technology and the appearance of a large number of dialogue data sets, an open-field-oriented dialogue system can be constructed by using the deep learning technology, and the application scenes of the dialogue system are greatly expanded.
In the field of open-domain dialog generation, the current mainstream practice is based on an end-to-end generation framework: an encoder is used to encode the dialog context into a feature vector, and a decoder is used to decode the dialog reply based on the previously generated vector. However, the underlying end-to-end dialog generation model may tend to generate generic, information-starved replies. Such as "good", "unknown", etc. In recent years, there have been a number of solutions proposed to this problem, such as: personalized information, topic information or external knowledge information is added to enable the model to better understand the semantics of the context and the like, however, the methods still only use a single decoder to generate the whole dialogue reply at one time without distinguishing real words and virtual words. Thus, the model tends to generate null words with less semantic information but a higher frequency of occurrence, rather than real words with more semantic information but a lower frequency of occurrence, still causing the model to generate a generic, informative response.
Disclosure of Invention
A first object of the present invention is to overcome the disadvantages of the prior art and to provide a dialog generation method based on two-stage decoding. According to the method, the real word sequence required in the reply is generated in the decoding of the first stage, and the final dialogue reply is generated in the decoding of the second stage based on the generated real word sequence and the context information, so that the influence of high-frequency imaginary words on the generation of low-frequency real words is effectively avoided, and the relevance and the information content of the reply are improved on the premise of ensuring the fluency of the reply.
A second object of the present invention is to provide a dialog generating device based on two-stage decoding.
A third object of the present invention is to provide a computer-readable storage medium.
It is a fourth object of the invention to provide a computing device.
The first purpose of the invention can be realized by the following technical scheme:
a dialog generation method based on two-stage decoding, generating a dialog by a dialog generation model of the two-stage decoding, the model including two self-attention encoders and two transform decoders, the method comprising the steps of:
(1) inputting a text of a conversation context in a model, and mapping each word in the text into a word embedding vector;
(2) the method comprises the steps that a word embedding vector is input into a context self-attention encoder by taking a sentence as a unit, and a feature vector of a context is extracted through the context self-attention encoder;
(3) inputting the obtained context feature vector into a first-stage transform decoder, and decoding to generate a real word sequence, wherein the real word sequence expresses main semantic information in final reply;
(4) inputting the obtained real word sequence into a real word sequence self-attention encoder to obtain a characteristic vector of the real word sequence;
(5) and inputting the context obtained by coding and the feature vector of the real word sequence into a second-stage transform decoder, and decoding to generate a final reply.
Preferably, in step (1), the word embedding vector mapped by the ith word in the text is represented as:
Figure BDA0002965107270000021
wherein i represents the position of the word in the text; x is the number ofiRepresenting the ith word in the text;
Figure BDA0002965107270000031
representing the word xiThe mapped initial word embedding vector;
pe (i) represents a position sine and cosine coding function, and codes the position i into a vector, so that the model can process the sequence information of the text, and the specific calculation method is: let d be the dimension of the final vector and the vector position be evenThe value of the number is PE (i, 2z) ═ sin (i/1000)2z/d) The odd vector position is PE (i, 2z +1) cos (i/1000)2z/d) Wherein z is equal to [ 0.,. d/2 ]]And finally obtaining the position code with the dimension d.
Preferably, in step (2), the extracted context feature vector is:
PSAE(U)=FFN(G)
G=MultiHead(Ins,Ins,Ins)
Ins=[s1,s2,...,sI]
wherein, Ins=[s1,s2,...,sI]Representing a matrix formed by all word embedding vectors in the input text, wherein the length of the matrix is I; multihead (In)s,Ins,Ins) Represents InsInputting the Q, K and V into a multi-head self-attention function, wherein G is the final output characteristic of the multi-head self-attention function; FFN (G) indicates that the multi-headed feature G obtained from the attention function is input into the full-link layer FFN to obtain a final context feature vector: psae (U) ═ ffn (g), where U ═ x1,...,xL]Representing the entered dialog context, L being the total number of words of the dialog context.
Preferably, in step (3), the ith real word c is generatediThe real word sequence c that has been generated≤i-1The vector matrix composed of all word embedding vectors in (a) is expressed as:
Figure BDA0002965107270000032
in the decoding process of the first stage transform decoder, two multi-headed autonomous attention layers are required, which are:
Figure BDA0002965107270000041
Figure BDA0002965107270000042
wherein the subscript 1 represents the decoding process of the first stage; psae (u) represents the extracted context feature vector;
to obtain the ith real word ciAnd also needs to order
Figure BDA0002965107270000043
After processing of the full connection layer and the softmax layer, the formula is as follows:
Figure BDA0002965107270000044
P(ci)=softmax(F1 (i))
finally, according to the probability distribution P (c)i) Generating the ith real word ci
By analogy, the first stage transform decoder finally decodes to generate the whole real word sequence C ═ C1,...,cK]And K is the total number of real words in the real word sequence.
Preferably, in step (4), the feature vector of the real word sequence extracted from the attention encoder by the real word sequence is:
CwSAE(C)=FFN(G)
G=MultiHead(Inc,Inc,Inc)
Inc=[c1,c2,...,cK]
wherein, Inc=[c1,c2,...,cK]Representing a matrix consisting of all word-embedded vectors in a real word sequence; multihead (In)c,Inc,Inc) And FFN (G) respectively represents a multi-head self-attention function and a full-connection function to obtain a final real word sequence feature vector: CwSAE (C).
Preferably, in step (5), the ith reply word r is generatediThen, the reply sequence r has been generated≤i-1The vector of (d) is represented as:
Figure BDA0002965107270000045
in the decoding process of the second stage transform decoder, two multi-headed autonomous attention layers are required to be passed, which are:
Figure BDA0002965107270000051
Figure BDA0002965107270000052
wherein the subscript 2 represents the second stage decoding process; CwSAE (U) represents the extracted feature vector of the real word sequence;
to obtain the ith reply word riAnd also needs to order
Figure BDA0002965107270000053
After processing of the full connection layer and the softmax layer, the formula is as follows:
Figure BDA0002965107270000054
Figure BDA0002965107270000055
finally, according to the probability distribution P (r)i) Generating the ith reply word ri
And by analogy, the second stage Transformer decoder finally decodes to generate the whole recovery sequence R ═ R1,...,rJ]And J is the total number of reply words in the reply sequence.
Preferably, training the model by using the context sentences of the existing conversation, the real word sequences and the conversation returns obtained in the steps (1) to (5) to obtain an optimal model;
in the training process, the loss functions of two decoding stages are respectively calculated:
Figure BDA0002965107270000056
Figure BDA0002965107270000057
a joint loss function is obtained:
Ltotal=L1+L2
and then based on the joint loss function LtotalAnd updating the model parameters by using a direction propagation mode to obtain a final optimal dialogue generating model.
The second purpose of the invention can be realized by the following technical scheme:
a dialog generating device based on two-stage decoding comprises a mapping module, a context self-attention encoder, a first-stage Transformer decoder, a real word sequence self-attention encoder and a second-stage Transformer decoder which are sequentially connected, wherein the context self-attention encoder is also connected to the second-stage Transformer decoder;
the mapping module is used for taking a text of a conversation context as input and mapping each word in the text into a word embedding vector;
the context self-attention encoder is used for taking a sentence as a unit, embedding words into a vector as input, and extracting a feature vector of a context;
the first stage Transformer decoder is used for decoding the context feature vector as an input to generate a real word sequence, and the real word sequence expresses main semantic information in final reply;
the real word sequence self-attention encoder is used for inputting the real word sequence and encoding to obtain a characteristic vector of the real word sequence;
and the second stage Transformer decoder is used for decoding the feature vector of the context and the feature vector of the real word sequence to generate a final reply by taking the feature vectors of the context and the feature vectors of the real word sequence as input.
The third object of the invention can be achieved by the following technical solutions: a computer-readable storage medium storing a program which, when executed by a processor, implements a dialog generation method based on two-stage decoding according to the first object of the present invention.
The fourth purpose of the invention can be realized by the following technical scheme: a computing device comprising a processor and a memory for storing a processor-executable program, wherein the processor, when executing the program stored in the memory, implements the dialog generating method based on two-stage decoding according to the first object of the present invention.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) the invention decouples the common single decoding process into two stages, and generates the real word sequence independently to avoid the influence of the dummy words with higher frequency but lacking semantic information on the real words, thereby improving the relevance and the information quantity of the dialog generation reply and reducing the occurrence of generating the general reply lacking the information quantity.
(2) The dialogue generation model is divided into a context self-attention encoder, a real word sequence self-attention encoder and a two-stage decoder, so that the model has better interpretability and is more convenient to realize the controllability of the model, and the reply generation can be adjusted by adjusting the input of the model.
Drawings
Fig. 1 is a flowchart of a dialog generation method based on two-stage decoding according to the present invention.
FIG. 2 is a schematic diagram of a dialog generation model.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Example 1
The embodiment discloses a dialog generating method based on two-stage decoding, and the method generates a dialog through a two-stage decoding dialog generating model, and can be applied to systems of various language responses in the field of human-computer interaction, such as chatting intelligent robots, voice-controlled intelligent home products and the like.
The structure of the dialog generation model is shown in fig. 2, and mainly includes two Self-Attention encoders (Self-Attention Encoder, SAE): a context Self-Attention Encoder (PSAE) and a real-word sequence Self-Attention Encoder (CwSAE) for feature extraction of a context sentence U and a real-word sequence C, respectively, and two transform decoders: a first-stage transform decoder and a second-stage transform decoder. The context self-attention encoder is connected with the first-stage Transformer decoder and the second-stage Transformer decoder, and the first-stage Transformer decoder, the real word sequence self-attention encoder and the second-stage Transformer decoder are sequentially connected.
As shown in fig. 1, the dialog generation method includes the steps of:
(1) text of a dialog context (the text contains several sentences) is input in the model, and each word in the text is mapped to a word embedding vector. The words include real words (such as verbs and nouns) and imaginary words (such as mood-assistant words).
The word embedding vector mapped by the ith word in the text is represented as:
Figure BDA0002965107270000081
where + represents the addition of the vectors; x is the number ofiRepresenting the ith word in the text;
Figure BDA0002965107270000082
representing the word xiThe mapped initial word embedding vector;
pe (i) represents a position sine and cosine coding function, which can code position i into a vector, so that the model can process the sequence information of the text, and the specific calculation method is: d is the dimension of the final vector, where the vector position is even and the value is PE (i, 2z) ═ sin (i/1000)2z/d) The odd vector position is PE (i, 2z +1) cos (i/1000)2z/d) Whereinz∈[0,...,d/2]And finally obtaining the position code with the dimension d.
(2) The word embedding vector is input into a context self-attention encoder by taking a sentence as a unit, and a feature vector of the context is extracted from the context self-attention encoder through the context.
The extracted context feature vector is:
PSAE(U)=FFN(G)
G=MultiHead(Ins,Ins,Ins)
Ins=[s1,s2,...,sI]
wherein, Ins=[s1,s2,...,sI]Representing a matrix formed by all word embedding vectors in the input text, wherein the length of the matrix is I; multihead (In)s,Ins,Ins) Represents InsInputting the Q, K and V into a multi-head self-attention function, wherein G is the final output characteristic of the multi-head self-attention function; FFN (G) indicates that the multi-headed features G obtained from the attention function are input into the full-connected layer FFN to obtain the final context sentence feature vector: psae (U) ═ ffn (g), where U ═ x1,...,xL]Representing the entered dialog context, L being the total number of words of the dialog context.
(3) And inputting the extracted context feature vector into a first-stage transform decoder, and decoding to generate a real word sequence which expresses main semantic information in the final reply.
Wherein, the ith real word c is generatediThe real word sequence c that has been generated≤i-1The vector matrix of (a) is represented as:
Figure BDA0002965107270000091
in the decoding process of the first stage transform decoder, two multi-headed autonomous attention layers are required, which are:
Figure BDA0002965107270000092
Figure BDA0002965107270000093
in the formula, subscript 1 represents the decoding process of the first stage; psae (u) represents the extracted context feature vector.
To obtain the ith real word ciAnd also needs to be
Figure BDA0002965107270000094
After the processing of the full connection layer and the softmax layer, the formula is as follows:
Figure BDA0002965107270000095
P(ci)=softmax(F1 (i))
finally, according to the probability distribution P (c)i) Generating the ith real word ci
By analogy, the whole real word sequence C ═ C can be generated by decoding1,...,cK]And K is the total number of real words in the real word sequence.
(4) Inputting the real word sequence (C) obtained by decoding in the first stage into a real word sequence self-attention encoder to obtain a feature vector of the real word sequence, wherein the feature vector is expressed as:
CwSAE(C)=FFN(G)
G=MultiHead(Inc,Inc,Inc)
Inc=[c1,c2,...,cK]
in the formula (II)c=[c1,c2,...,cK]Representing a matrix consisting of all word-embedded vectors in a real word sequence; multihead (In)c,Inc,Inc) And FFN (G) represents the multi-headed self-attention function and the full-concatenation function, IncAs input to the multi-headed self-attention function, the multi-headed self-attention function ends upThe output feature G is input into a full-connection function FFN, and finally a real word sequence feature vector is obtained: CwSAE (C).
(5) And inputting the feature vector of the context obtained by coding and the feature vector of the real word sequence into a second-stage Transformer decoder, and decoding to generate a final reply.
Wherein, the ith reply word r is generatediThen, the reply sequence r has been generated≤i-1The vector of (d) is represented as:
Figure BDA0002965107270000101
similar to the first-stage decoder, in the decoding process of the second-stage transform decoder, two multi-headed autonomous attention layers are required, respectively:
Figure BDA0002965107270000102
Figure BDA0002965107270000103
in the formula, subscript 2 represents the decoding process of the second stage; CwSAE (U) represents the extracted feature vector of the real word sequence.
To obtain the ith reply word riAnd also needs to be
Figure BDA0002965107270000104
After the processing of the full connection layer and the softmax layer, the formula is as follows:
Figure BDA0002965107270000111
Figure BDA0002965107270000112
finally, according to the probability distribution P (r)i) Generating the ith reply word ri
By analogy, the entire reply sequence R ═ R can be decoded1,...,rJ]And J is the total number of reply words in the reply sequence.
In this embodiment, two encoders and two decoders may be trained in advance, and joint training is performed using the context sentences of the existing dialog, the real word sequence obtained through steps (1) to (5), and the dialog reply, so as to obtain the optimal model.
In the training process, the loss functions of two decoding stages are respectively calculated:
Figure BDA0002965107270000113
Figure BDA0002965107270000114
a joint loss function is obtained:
Ltotal=L1+L2
and then based on the joint loss function LtotalAnd updating the model parameters by using a direction propagation mode to obtain a final optimal dialogue generating model.
Example 2
This embodiment discloses a dialog generating device based on two-stage decoding, which can implement the dialog generating method in embodiment 1. The dialogue generating device comprises a mapping module, a context self-attention encoder, a first-stage Transformer decoder, a real word sequence self-attention encoder and a second-stage Transformer decoder which are connected in sequence, wherein the context self-attention encoder is also connected to the second-stage Transformer decoder.
The mapping module takes the text of the conversation context as input and can be used for mapping each word in the text into a word embedding vector;
for a context self-attention encoder, a word embedding vector is used as input by taking a sentence as a unit, and a feature vector of a context can be extracted;
for the first stage transform decoder, it takes the context feature vector as input, and can be used to decode and generate a real word sequence, which expresses the main semantic information in the final reply;
the real word sequence self-attention encoder takes the real word sequence as input and can be used for encoding to obtain a characteristic vector of the real word sequence;
for the second stage transform decoder, the context's feature vector and the real word sequence's feature vector are used as its inputs for decoding to generate the final reply.
It should be noted that, the apparatus of this embodiment is only exemplified by the division of the above functional modules, and in practical applications, the above functions may be distributed by different functional modules as needed, that is, the internal structure may be divided into different functional modules to complete all or part of the above described functions.
Example 3
The present embodiment discloses a computer-readable storage medium, which stores a program, and when the program is executed by a processor, the method for generating a dialog based on two-stage decoding according to embodiment 1 is implemented, specifically:
(1) inputting a text of a conversation context in a model, and mapping each word in the text into a word embedding vector;
(2) the method comprises the steps that a word embedding vector is input into a context self-attention encoder by taking a sentence as a unit, and a feature vector of a context is extracted through the context self-attention encoder;
(3) inputting the obtained context feature vector into a first-stage transform decoder, and decoding to generate a real word sequence, wherein the real word sequence expresses main semantic information in final reply;
(4) inputting the obtained real word sequence into a real word sequence self-attention encoder to obtain a characteristic vector of the real word sequence;
(5) and inputting the context obtained by coding and the feature vector of the real word sequence into a second-stage transform decoder, and decoding to generate a final reply.
The computer-readable storage medium in this embodiment may be a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a Random Access Memory (RAM), a usb disk, a removable hard disk, or other media.
Example 4
The embodiment discloses a computing device, which includes a processor and a memory for storing an executable program of the processor, and when the processor executes the program stored in the memory, the dialog generation method based on two-stage decoding described in embodiment 1 is implemented, specifically:
(1) inputting a text of a conversation context in a model, and mapping each word in the text into a word embedding vector;
(2) the method comprises the steps that a word embedding vector is input into a context self-attention encoder by taking a sentence as a unit, and a feature vector of a context is extracted through the context self-attention encoder;
(3) inputting the obtained context feature vector into a first-stage transform decoder, and decoding to generate a real word sequence, wherein the real word sequence expresses main semantic information in final reply;
(4) inputting the obtained real word sequence into a real word sequence self-attention encoder to obtain a characteristic vector of the real word sequence;
(5) and inputting the context obtained by coding and the feature vector of the real word sequence into a second-stage transform decoder, and decoding to generate a final reply.
The computing device described in this embodiment may be a desktop computer, a notebook computer, a smart phone, a PDA handheld terminal, a tablet computer, or other terminal device with a processor function.
The above description is only for the preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution of the present invention and the inventive concept within the scope of the present invention, which is disclosed by the present invention, and the equivalent or change thereof belongs to the protection scope of the present invention.

Claims (10)

1. A dialog generation method based on two-stage decoding, characterized in that a dialog is generated by a dialog generation model of two-stage decoding, the model comprising two self-attention encoders and two transform decoders, the method comprising the steps of:
(1) inputting a text of a conversation context in a model, and mapping each word in the text into a word embedding vector;
(2) the method comprises the steps that a word embedding vector is input into a context self-attention encoder by taking a sentence as a unit, and a feature vector of a context is extracted through the context self-attention encoder;
(3) inputting the obtained context feature vector into a first-stage transform decoder, and decoding to generate a real word sequence, wherein the real word sequence expresses main semantic information in final reply;
(4) inputting the obtained real word sequence into a real word sequence self-attention encoder to obtain a characteristic vector of the real word sequence;
(5) and inputting the context obtained by coding and the feature vector of the real word sequence into a second-stage transform decoder, and decoding to generate a final reply.
2. The dialog generation method of claim 1, wherein in step (1), the word embedding vector mapped by the ith word in the text is represented as:
Figure FDA0002965107260000011
wherein i represents the position of the word in the text; x is the number ofiRepresenting the ith word in the text;
Figure FDA0002965107260000012
representing the word xiThe mapped initial word embedding vector;
pe (i) represents a position sine and cosine coding function, and codes the position i into a vector, so that the model can process the sequence information of the text, and the specific calculation method is: let d be the dimension of the final vector, and the value with even vector positions be PE (i, 2z) ═ sin (i/1000)2z/d) The odd vector position is PE (i, 2z +1) cos (i/1000)2z/d) Wherein z is equal to [ 0.,. d/2 ]]And finally obtaining the position code with the dimension d.
3. The dialog generation method according to claim 1, wherein in step (2), the context feature vector is extracted as:
PSAE(U)=FFN(G)
G=MultiHead(Ins,Ins,Ins)
Ins=[s1,s2,...,sI]
wherein, Ins=[s1,s2,...,sI]Representing a matrix formed by all word embedding vectors in the input text, wherein the length of the matrix is I; multihead (In)s,Ins,Ins) Represents InsInputting the Q, K and V into a multi-head self-attention function, wherein G is the final output characteristic of the multi-head self-attention function; FFN (G) indicates that the multi-headed feature G obtained from the attention function is input into the full-link layer FFN to obtain a final context feature vector: psae (U) ═ ffn (g), where U ═ x1,...,xL]Representing the entered dialog context, L being the total number of words of the dialog context.
4. The dialog generation method according to claim 1, wherein in the step (3), the ith real word c is generatediThe real word sequence c that has been generated≤iThe vector matrix composed of all word embedding vectors in-1 is represented as:
Figure FDA0002965107260000021
in the decoding process of the first stage transform decoder, two multi-headed autonomous attention layers are required, which are:
Figure FDA0002965107260000022
Figure FDA0002965107260000023
wherein the subscript 1 represents the decoding process of the first stage; psae (u) represents the extracted context feature vector;
to obtain the ith real word ciAnd also needs to order
Figure FDA0002965107260000024
After processing of the full connection layer and the softmax layer, the formula is as follows:
F1 (i)=FFN(H1 (i))
P(ci)=softmax(F1 (i))
finally, according to the probability distribution P (c)i) Generating the ith real word ci
By analogy, the first stage transform decoder finally decodes to generate the whole real word sequence C ═ C1,...,cK]And K is the total number of real words in the real word sequence.
5. The dialog generation method according to claim 1, wherein in step (4), the feature vector of the real-word sequence extracted from the attention encoder in the real-word sequence is:
CwSAE(C)=FFN(G)
G=MultiHead(Inc,Inc,Inc)
Inc=[c1,c2,...,cK]
wherein, Inc=[c1,c2,...,cK]Representing a matrix consisting of all word-embedded vectors in a real word sequence; multihead (In)c,Inc,Inc) And FFN (G) respectively represents a multi-head self-attention function and a full-connection function to obtain a final real word sequence feature vector: CwSAE (C).
6. The dialog generation method of claim 1, wherein in step (5), the ith reply word r is generatediThen, the reply sequence r has been generated≤i-1The vector of (d) is represented as:
Figure FDA0002965107260000031
in the decoding process of the second stage transform decoder, two multi-headed autonomous attention layers are required to be passed, which are:
Figure FDA0002965107260000032
Figure FDA0002965107260000033
wherein the subscript 2 represents the second stage decoding process; CwSAE (U) represents the extracted feature vector of the real word sequence;
to obtain the ith reply word riAnd also needs to order
Figure FDA0002965107260000034
After processing of the full connection layer and the softmax layer, the formula is as follows:
Figure FDA0002965107260000035
Figure FDA0002965107260000036
finally, according to the probability distribution P (r)i) Generating the ith reply word ri
And by analogy, the second stage Transformer decoder finally decodes to generate the whole recovery sequence R ═ R1,...,rJ]And J is the total number of reply words in the reply sequence.
7. The dialog generation method according to claim 1, characterized in that the model is trained using the context sentences of the existing dialog, the real word sequence and the dialog reply obtained in steps (1) to (5) to obtain an optimal model;
in the training process, the loss functions of two decoding stages are respectively calculated:
Figure FDA0002965107260000041
Figure FDA0002965107260000042
a joint loss function is obtained:
Ltotal=L1+L2
and then based on the joint loss function LtotalAnd updating the model parameters by using a direction propagation mode to obtain a final optimal dialogue generating model.
8. A dialog generating device based on two-stage decoding is characterized by comprising a mapping module, a context self-attention encoder, a first-stage Transformer decoder, a real word sequence self-attention encoder and a second-stage Transformer decoder which are sequentially connected, wherein the context self-attention encoder is also connected to the second-stage Transformer decoder;
the mapping module is used for taking a text of a conversation context as input and mapping each word in the text into a word embedding vector;
the context self-attention encoder is used for taking a sentence as a unit, embedding words into a vector as input, and extracting a feature vector of a context;
the first stage Transformer decoder is used for decoding the context feature vector as an input to generate a real word sequence, and the real word sequence expresses main semantic information in final reply;
the real word sequence self-attention encoder is used for inputting the real word sequence and encoding to obtain a characteristic vector of the real word sequence;
and the second stage Transformer decoder is used for decoding the feature vector of the context and the feature vector of the real word sequence to generate a final reply by taking the feature vectors of the context and the feature vectors of the real word sequence as input.
9. A computer-readable storage medium storing a program, wherein the program, when executed by a processor, implements the dialog generation method based on two-stage decoding according to any one of claims 1 to 7.
10. A computing device comprising a processor and a memory for storing processor-executable programs, wherein the processor, when executing a program stored in the memory, implements the two-stage decoding-based dialog generation method of any of claims 1 to 7.
CN202110248798.1A 2021-03-08 2021-03-08 Dialog generation method and device based on two-stage decoding, medium and computing equipment Pending CN112988967A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110248798.1A CN112988967A (en) 2021-03-08 2021-03-08 Dialog generation method and device based on two-stage decoding, medium and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110248798.1A CN112988967A (en) 2021-03-08 2021-03-08 Dialog generation method and device based on two-stage decoding, medium and computing equipment

Publications (1)

Publication Number Publication Date
CN112988967A true CN112988967A (en) 2021-06-18

Family

ID=76335688

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110248798.1A Pending CN112988967A (en) 2021-03-08 2021-03-08 Dialog generation method and device based on two-stage decoding, medium and computing equipment

Country Status (1)

Country Link
CN (1) CN112988967A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449491A (en) * 2021-07-05 2021-09-28 思必驰科技股份有限公司 Pre-training framework for language understanding and generation with two-stage decoder
CN113656569A (en) * 2021-08-24 2021-11-16 电子科技大学 Generating type dialogue method based on context information reasoning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190228070A1 (en) * 2016-09-30 2019-07-25 Huawei Technologies Co., Ltd. Deep learning based dialog method, apparatus, and device
CN110502627A (en) * 2019-08-28 2019-11-26 上海海事大学 A kind of answer generation method based on multilayer Transformer polymerization encoder
US20210027022A1 (en) * 2019-07-22 2021-01-28 Capital One Services, Llc Multi-turn Dialogue Response Generation with Autoregressive Transformer Models

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190228070A1 (en) * 2016-09-30 2019-07-25 Huawei Technologies Co., Ltd. Deep learning based dialog method, apparatus, and device
US20210027022A1 (en) * 2019-07-22 2021-01-28 Capital One Services, Llc Multi-turn Dialogue Response Generation with Autoregressive Transformer Models
CN110502627A (en) * 2019-08-28 2019-11-26 上海海事大学 A kind of answer generation method based on multilayer Transformer polymerization encoder

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JUNSHENG KONG: "TSDG: Content-aware Neural Response Generation with Two-stage Decoding Process", 《FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: EMNLP 2020》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449491A (en) * 2021-07-05 2021-09-28 思必驰科技股份有限公司 Pre-training framework for language understanding and generation with two-stage decoder
CN113449491B (en) * 2021-07-05 2023-12-26 思必驰科技股份有限公司 Pre-training framework for language understanding and generation with two-stage decoder
CN113656569A (en) * 2021-08-24 2021-11-16 电子科技大学 Generating type dialogue method based on context information reasoning
CN113656569B (en) * 2021-08-24 2023-10-13 电子科技大学 Context information reasoning-based generation type dialogue method

Similar Documents

Publication Publication Date Title
CN107293296B (en) Voice recognition result correction method, device, equipment and storage medium
CN107291836B (en) Chinese text abstract obtaining method based on semantic relevancy model
CN110781306B (en) English text aspect layer emotion classification method and system
CN109740158B (en) Text semantic parsing method and device
CN110990555B (en) End-to-end retrieval type dialogue method and system and computer equipment
CN110277088B (en) Intelligent voice recognition method, intelligent voice recognition device and computer readable storage medium
CN112348911B (en) Semantic constraint-based method and system for generating fine-grained image by stacking texts
CN113569562B (en) Method and system for reducing cross-modal and cross-language barriers of end-to-end voice translation
CN112988967A (en) Dialog generation method and device based on two-stage decoding, medium and computing equipment
CN113987169A (en) Text abstract generation method, device and equipment based on semantic block and storage medium
US11947920B2 (en) Man-machine dialogue method and system, computer device and medium
US20230127787A1 (en) Method and apparatus for converting voice timbre, method and apparatus for training model, device and medium
CN112560456A (en) Generation type abstract generation method and system based on improved neural network
CN114420107A (en) Speech recognition method based on non-autoregressive model and related equipment
CN112199502A (en) Emotion-based poetry sentence generation method and device, electronic equipment and storage medium
Gao et al. Generating natural adversarial examples with universal perturbations for text classification
CN115831105A (en) Speech recognition method and device based on improved Transformer model
CN111597815A (en) Multi-embedded named entity identification method, device, equipment and storage medium
CN113450758B (en) Speech synthesis method, apparatus, device and medium
CN113312498B (en) Text information extraction method for embedding knowledge graph by undirected graph
CN113987162A (en) Text abstract generation method and device and computer equipment
CN112668346B (en) Translation method, device, equipment and storage medium
CN116226357B (en) Document retrieval method under input containing error information
CN115269768A (en) Element text processing method and device, electronic equipment and storage medium
CN111475635A (en) Semantic completion method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210618