CN112925896A

CN112925896A - Topic extension emotional dialogue generation method based on joint decoding

Info

Publication number: CN112925896A
Application number: CN202110364233.XA
Authority: CN
Inventors: 肖乐; 段梦诗; 李清; 杨卫东; 李家馨; 岳思雯; 轩辕敏峥
Original assignee: Henan University of Technology
Current assignee: Henan University of Technology
Priority date: 2021-04-04
Filing date: 2021-04-04
Publication date: 2021-06-08

Abstract

The invention provides a topic expansion emotional dialogue generation method based on joint decoding, belonging to the field of natural language processing; the method comprises the following steps: the content and the theme joint attention vector are sent to a content independent unit of a decoder, and the consistency of the generated reply theme is improved; the category of the appointed emotion is used as additional input and is embedded into an emotion independent unit of a decoder, so that the generated reply has richer content emotion word expression, and the problem of content relevance reduction is relieved to a certain extent; the invention can not only generate the dialogue with specific emotion types, but also ensure that the generated reply and the input dialogue are under the same dialogue theme.

Description

Topic extension emotional dialogue generation method based on joint decoding

Technical Field

The invention relates to the technical field of non-task dialogue generation systems, in particular to a topic extension emotional dialogue generation type reply generation method based on joint decoding.

Background

In recent years, with the progress of man-machine interactive systems, the application range thereof has been expanded, and early task interactive systems have helped customers to meet specific target demands, but have failed to provide deep-level emotional response to users. Nowadays, a deep learning technology is adopted to establish a non-task type conversation system in the open field, researchers use emotion perception as an important component of the conversation system, hope that the robot can realize daily chatting with a user, perceive the emotion of the user in the chatting and make a corresponding emotion response. Zhou et al generated various emotional reactions by the emotional guidance in the supervised conversation, indicating that the emotion is a high-level abstract expression, and the conversation system added with emotion can improve the experience and satisfaction of the user, so that people turn their needs from richness of the reply content to deeper mental level communication. Furthermore, we find through chatting in daily life that the conversation between people not only involves the content of conversation but also includes the consistency of the topic of the conversation. At present, the research still faces many problems, so that the improvement of the personification of the dialogue generation model and the sharing of the dialogue capacity are still the main research direction of the dialogue system.

In 2017, Zhou et al first proposed an Emotion Chat Machine (ECM) based on a memory network, which can generate different emotion responses according to a designated emotion category. Based on this, variant models generated by seq2seq dialog have been proposed later, such as Peng et al, which use a TE-ECG model to dynamically generate emotions using emotion computation to specify emotion classes. Yang et al use the fusion module to expand the subject term and then promote the richness of the content on this basis. Liang et al predict appropriate emotions through an isomeric neural network in combination with multimodal data information. Zhang et al uses multiple embedded fusion layers to generate high quality content. However, these neglect the reduction of emotion expression caused by adding theme in the encoder, and in order to solve this problem, we propose a method that not only ensures the diversity of chat theme, but also generates rich content under the appointed emotion.

Disclosure of Invention

The invention aims to provide a topic extension emotional dialogue generation method based on joint decoding, which is designed for overcoming the defects of the prior art. Firstly, obtaining an input sequence subject term by adopting a Twitter LDA model; then, the combined content and subject attention machine is made as the input of a content decoder, and the generated content and the input content are ensured to be consistent in performance on subject relevance; and finally, the category of the appointed emotion is used as additional input to be embedded into an emotion independent unit of a decoder, so that the expression of the content influenced by the emotion added into the model is reduced. The invention has simple design, concise content, convenience and practicability, and meets two conditions of a conversation generation system: not only can conversations of a particular emotion category be generated, but also the generated replies are ensured to be in the same conversation topic as the input conversations.

In order to achieve the above object, the present invention provides a new dialog generation method, which is characterized in that a generation model composed of an encoder, a joint attention device and a decoder is adopted, and the generation of a specific reply comprises the following steps.

The method comprises the following steps: semantic information is obtained for input conversations through a BilSTM encoder, subject words are extracted through a Twitter LDA model, and input contents and the subject words are weighted through an attention mechanism. When a topic model is used for extracting topic words, parameters of a Twitter LDA topic model are estimated by a Gibbs sampling algorithm, then the extracted topic words are distributed to a source sequence by using the model, the top m keywords with the highest probability under the topic words are selected (m is set to be 10), and nonsensical common words such as 'good' and 'our' are deleted.

Step two: the class of the specified emotion is embedded as an additional input into the emotion independent unit of the decoder. Each randomly initialized emotion category is represented using a low-dimensional vector one-hot.

Step three: and combining the attention mechanism of the content and the theme with the hidden state output by the content independent unit of the decoder and the output at the moment of i-1, splicing the hidden state and the output together, and sending the spliced hidden state and the output to the content independent unit of the decoder to ensure that the content related to the theme is output, and finally smoothly fusing the emotion independent unit and the content independent unit.

The invention has the beneficial effects that:

1) the practicability is as follows: by introducing topic information into the content independent unit of the decoder, the generated dialog is better represented in topic relevance with the input dialog. And meanwhile, feeding emotion information as additional input to an emotion independent unit of a decoder, and finally smoothly fusing the emotion independent unit and the content independent unit. This not only ensures diversity in chat content, but also generates replies specifying emotion classifications.

2) Correctness: the emotion independent unit and the content independent unit of the decoder respectively contain the emotion type and content of the conversation, so that the generated reply has richer content emotion word expression, and the problem of content relevance reduction is relieved to a certain extent.

3) The design is simple, the content is concise, and the method has wider practical significance.

Drawings

FIG. 1 is a schematic diagram of a new generative model in an embodiment of the method of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1, the dialog generating method of the present embodiment includes the following steps:

the method comprises the following steps: semantic information of an input dialogue is obtained through a BilSTM encoder, a subject term is extracted through a Twitter LDA model, and input content and the subject term are weighted by adopting an attention mechanism; when a subject word is extracted by using a subject model, firstly, estimating parameters of a Twitter LDA subject model by using a Gibbs sampling algorithm, then, using the model to allocate the extracted subject to a source sequence, selecting the top m keywords (m is set to be 10) with the highest probability under the subject, and deleting meaningless common words such as 'good' and 'our';

converting a given source sequence dialog message X into { X ═ X₁,x₂,...,x_mY-Y mapped as a target sequence by word embedding₁,y₂,...,y_mThen the generation probability of the model target sequence Y is:

p(y₁,y₂,...,y_m|x₁,x₂,...,x_mb) of the group A and B); where forward LSTM encodes the word vector as a hidden layer state

Inverse LSTM encodes word vectors into hidden layer states

Will hide the state

And

vector splicing is carried out to obtain the final hidden state h_i(ii) a Meanwhile, unlike the conventional Seq2Seq model, the attention of each hidden state output by the encoder is calculated to obtain a content attention mechanism vector c_i；

In order to better consider the relevance and diversity of topics, a Twitter LDA topic model is introduced into an encoder to extract topic words and take the topic words as additional input of the model, compared with a traditional attention mechanism, a dynamic attention mechanism combining topics and contents is adopted, the weight of the topic words in reply generation is enhanced, topic vectors are more relevant to the contents of input messages, and the probability of the topic words in reply generation is further increased; estimating parameters of a Twitter LDA topic model by using a Gibbs sampling algorithm; then use the model toAssigning the extracted subject to a source sequence, selecting the top m keywords with the highest probability under the subject (setting m to 10), and deleting nonsensical common words such as 'good' and 'our', wherein the distribution is used as the vector representation of the subject words because the vector representation of each subject word is needed in the learning process; at the same time, the word vector of the subject word { topic1_mThe topic attention mechanism vector o can be obtained through attention mechanism calculation_iRepresents; finally, the attention of the spliced content is controlled by a vector c_iAnd a topic vector o_iThe vector of (a) is sent to a decoder as the input of a decoder content independent unit;

step two: embedding the category of the appointed emotion into an emotion independent unit of a decoder as an additional input, wherein each randomly initialized emotion category is represented by a low-dimensional vector one-hot;

topic and c in guarantee generation response_iWhen the input is consistent, the generated reply is also provided with emotion, and the emotion decoding unit adds the emotion type d, which is different from the content decoding unit_i-1And labeling the dialogue data by using a BilSTM emotion classifier, wherein the emotion classifier is divided into six types of emotions: happiness, anger, sadness, likes, dislikes and others; each randomly initialized emotion category is represented by using a low-dimensional vector one-hot, and the emotion category generating response is used as additional input, so that the emotion decoding unit learns the high-level abstract expression capability of the emotion under the guidance of the specified emotion category, and the inaccuracy of response to content and theme due to the addition of emotional factors in the model is reduced;

step three: combining the attention mechanism of the content and the theme with the hidden state output by the content independent unit of the decoder and the output at the moment of i-1, splicing the hidden state and the output at the moment of i-1, and sending the spliced hidden state and the output to the content independent unit of the decoder so as to ensure that the content related to the theme is output, and finally smoothly fusing the emotion independent unit and the content independent unit;

the model uses two layers of GRUs as joint decoders of content and emotion respectively, firstly, the first layer of GRU divides a decoded hidden state into two modules of content and emotion, and secondly, the second layer of GRU fuses the hidden states of the two modules;

the first layer decoding unit of the model is composed of two GRUs, one is a content decoding unit which enables the content in the response to be consistent with the content of the input dialogue, the other is an emotion decoding unit which synthesizes all kinds of emotions into one unit, and different emotions can be distinguished through the input of appointed emotion types, and the ability of emotion expression in the response can be learned; finally, the decoder decodes the content of the ith time step into a unit s_g(i)And emotion decoding unit s_a(i)The hidden states of (a) are spliced into the hidden state s of the first layer decoding unit of the ith time step_1(i)；

The second layer decoding unit of the model smoothly fuses the hidden states of the first layer decoding unit and updates the states, and the hidden state of the second layer decoder at the ith time step is composed of the hidden state at the step i-1 and the hidden state s of the first layer decoder_1(i)Calculating through a neural network GRU; the final hidden state of the decoder is denoted s_2(i)；

And finally, sequentially obtaining the probability distribution of the target sequence by the decoder through full connection and a Softmax function.

Claims

1. A new dialogue generation model is characterized in that generated contents not only contain designated emotion categories, but also the topics of the generated contents are consistent with input dialogue, and richer contents can be generated in response;

the method comprises the following steps:

the method comprises the following steps: semantic information of an input dialogue is obtained through a BilSTM encoder, a subject term is extracted through a Twitter LDA model, and input content and the subject term are weighted by adopting an attention mechanism; when a subject word is extracted by using a subject model, firstly, estimating parameters of a Twitter LDA subject model by using a Gibbs sampling algorithm, then, using the model to allocate the extracted subject to a source sequence, selecting the top m keywords (setting m = 10) with the highest probability under the subject, and deleting meaningless common words such as 'good' and 'our';

a given source sequence dialogue message is mapped into a response of a target sequence through word embedding, and a hidden layer state of a forward LSTM word vector and a hidden layer state of a reverse LSTM word vector are spliced to obtain a final hidden state; meanwhile, different from the traditional Seq2Seq model, attention calculation is carried out on each hidden state output by the encoder to obtain a content attention mechanism vector;

in order to better consider the relevance and diversity of topics, a Twitter LDA topic model is introduced into an encoder to extract topic words and take the topic words as additional input of the model, compared with a traditional attention mechanism, a dynamic attention mechanism combining topics and contents is adopted, the weight of the topic words in reply generation is enhanced, topic vectors are more relevant to the contents of input messages, and the probability of the topic words in reply generation is further increased; assigning the extracted theme to a source sequence by using a theme model, selecting the top m keywords with the highest probability under the theme, and deleting meaningless common words, wherein the distribution is used as the vector representation of the subject words because the vector representation of each subject word is required in the learning process; meanwhile, the word vector of the subject word can be calculated through the attention mechanism to obtain the expression of the subject attention mechanism vector; finally, the spliced content attention mechanism vector and the topic vector are sent to a decoder and used as the input of a decoder content independent unit;

when the theme in the generated response is ensured to be consistent with the input, the generated reply is also provided with emotion, which is different from the content decoding unit that the emotion decoding unit adds emotion categories, and a BilSTM emotion classifier is used for labeling the dialogue data, and the emotion is divided into six types: happiness, anger, sadness, likes, dislikes and others; each randomly initialized emotion category is represented by using a low-dimensional vector one-hot, and the emotion category generating response is used as additional input, so that the emotion decoding unit learns the high-level abstract expression capability of the emotion under the guidance of the specified emotion category, and the inaccuracy of response to content and theme due to the addition of emotional factors in the model is reduced;

the first layer decoding unit of the model is composed of two GRUs, one is a content decoding unit which enables the content in the response to be consistent with the content of the input dialogue, the other is an emotion decoding unit which synthesizes all kinds of emotions into one unit, and different emotions can be distinguished through the input of appointed emotion types, and the ability of emotion expression in the response can be learned; finally, the decoder splices the hidden states of the content decoding unit and the emotion decoding unit of the ith time step into the hidden state of the first layer decoding unit of the ith time step;

the second layer decoding unit of the model smoothly fuses the hidden states of the first layer decoding unit and updates the states, and the hidden state of the second layer decoder at the ith time step is obtained by calculating the hidden state at the step i-1 and the hidden state of the first layer decoder through a neural network GRU;

2. The dialog generation method according to claim 1, characterized in that the expression of the content influenced by the factors such as the added subject and emotion is reduced, and the specific steps include:

and weighting the input content and the subject term by adopting an attention mechanism, splicing the content attention mechanism vector, the subject attention mechanism vector and the final hidden state output by the decoder and the output at the moment of i-1 together, and sending the spliced content attention mechanism vector, the final hidden state output by the decoder and the output at the moment of i-1 into a content independent unit of the decoder, so that the weight of the subject term in reply generation is enhanced, the subject vector is more related to the content of the input message, and the probability of the subject term in reply generation is further increased.