CN111241250B

CN111241250B - Emotion dialogue generation system and method

Info

Publication number: CN111241250B
Application number: CN202010074840.8A
Authority: CN
Inventors: 窦志成
Original assignee: Renmin University of China
Current assignee: Renmin University of China
Priority date: 2020-01-22
Filing date: 2020-01-22
Publication date: 2023-10-24
Anticipated expiration: 2040-01-22
Also published as: CN111241250A

Abstract

The application relates to an emotion dialogue generation system and method, comprising the following steps: the emotion dialogue generation module and the reordering module; the emotion dialogue generation module comprises a basic reply generation module and a semantic-correct basic reply generation module, wherein the basic reply generation module is used for generating a semantically-correct basic reply; the multi-model emotion response generation module is used for establishing emotion models corresponding to various emotions through hierarchical training and obtaining emotion responses based on the emotion models; the single model emotion response generation module takes emotion types as input training emotion models and outputs emotion responses according to the emotion models; the reordering module receives replies output by three sub-modules in the dialogue generating module, scores the replies output, reorders the replies output by each module according to the score, and the reply with the highest score is the final emotion reply. In the man-machine conversation process, the machine can generate conversation replies meeting specific emotion, and the conversation replies are smooth in grammar, consistent in semantic and meeting emotion consistency, so that user experience in man-machine interaction is improved.

Description

Emotion dialogue generation system and method

Technical Field

The application relates to an emotion dialogue generation system and method, and belongs to the technical field of artificial intelligence.

Background

With the advent of man-machine interaction products such as Siri, google Home, heaven cat fairy, personal assistants such as colleagues and intelligent sound boxes, man-machine interaction is more and more valued by the industry and the academic world, and has more and more influence on the daily life of people. At present, many methods for implementing a dialogue system exist, but most dialogue modes of the dialogue system are task type dialogues, namely, man-machine interaction is performed in a mode of issuing commands. However, with the development of technology, a mere task dialogue has not been required by people, and people hope that the reply content of a robot can be more fluent, conform to the speaking mode of people, can distinguish emotion colors in a language and can make appropriate replies. So that the man-machine boring conversation mode is generated.

Chat conversations typically do not have a fixed topic area, which makes it difficult for the machine to select appropriate answer content. There are researchers that convert this problem into a matching problem, get candidate replies through the retrieval system, and then get the appropriate replies using the text matching algorithm. Other researchers have focused on conforming the machine-returned content to the context of a daily conversation and the human expression habits. The retrieval-based or template-based dialogue model, while avoiding question-and-answer and grammar errors, is often limited in practical application by the quality of the matching model and the inability of the system to contain all reply sentences or templates. Resulting in a machine that is generally stiff, lettering, semantically unsound, and without emotional color. Especially for some newer vocabularies, expressions, and some with obvious emotional colors, no semantically acceptable answer can be given.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, the present application aims to provide an emotion dialogue generation system and method, which enable a machine to generate dialogue replies meeting specific emotion in a man-machine dialogue process, so that the replies meet both grammar compliance and semantic consistency and emotion consistency, thereby not only improving the quality of machine generated replies, but also improving the user experience in man-machine interaction.

In order to achieve the above object, the present application provides an emotion dialogue generation system, including: the emotion dialogue generation module comprises a basic reply generation module which generates a basic reply with correct semantics by detecting entity words of an input sentence; the multi-model emotion response generation module is used for establishing emotion models corresponding to various emotions through hierarchical training, and performing fine adjustment on basic response based on the emotion models to obtain emotion response; the single model emotion response generation module takes emotion types as input training emotion models and outputs emotion responses according to the emotion models; the reordering module receives replies output by three sub-modules in the dialogue generating module, scores the replies output, reorders the replies output by each module according to the score, and the reply with the highest score is the final emotion reply.

Further, the basic reply generation module identifies entity words with practical meaning in the input sentence based on the rule generation model, implants a manually written reply template in the rule generation model, determines the reply template according to the entity words, and determines the basic reply with correct semantics according to the reply module.

Further, the multi-model emotion response generation module and the single-model emotion response generation module generate models with emotion types through a Seq2Seq model, wherein the Seq2Seq model comprises an Encoder Encoder and a Decoder, and the Encoder comprises a DecoderThe encoder is used for converting the input sentence into a dense vector h= (H) with an intermediate state ₁ ，h ₂ ，...，h _n ) The decoder is arranged to decode this intermediate state vector H into an output sentence Y of the emotion model.

Further, attention mechanism is introduced in the Seq2Seq model, which is used for enriching input information of the decoder, and the decoder introducing the attention mechanism adopts the following formula for decoding:

s _i ＝GRU _decoder (y _i-1 ，s _i-1 ，c _i )

where i is the different time instants of the decoder; j is the different instants of the encoder; s is(s) _i Is the implicit state of the decoder at each instant i in the decoding process; h is a _j Is the vector representation of the intermediate state dense vector H at time j in the encoder encoding process; e, e _ij Is the decoder hidden state s at the last instant _i-1 And intermediate state h with encoder at different instants j _j Calculated attention importance, where W _a Is a learned parameter matrix; alpha _ij The weighted weights which are obtained by normalizing the importance by an attention mechanism and are distributed to the intermediate vectors of different times of the encoder; n is the length of the input; c _i The method is characterized in that the method comprises the steps of weighting and summing all intermediate states of an encoder through attention mechanism weights to obtain vector representation of context information through calculation; y is _i Is a word vector that generates words at time i.

Further, the multi-model emotion response generation module comprises a plurality of Seq2Seq generation models corresponding to emotion categories, and the generation process of the Seq2Seq model is as follows: firstly, generating a general model obtained by training all corpus; secondly, dividing all corpus into positive emotion corpus and negative emotion corpus according to emotion types, respectively trimming a general model, and dividing the general model into a positive emotion model and a negative emotion model; finally, the forward emotion corpus is divided into a happy emotion corpus and a favorite emotion corpus, and the forward emotion model is finely tuned respectively and is divided into a model corresponding to happy and a model corresponding to favorite; the negative emotion corpus is divided into aversion, sadness and anger emotion corpus, and the negative emotion models are finely tuned respectively, and are divided into a model corresponding to aversion, a model corresponding to sadness and a model corresponding to anger.

Further, the single-model emotion reply generation module only comprises a Seq2Seq model, and the emotion type i is converted into an emotion vector e in the Seq2Seq model _i And will emotion vector e _i And adding the information into the decoding process of a decoder to enable the information of the emotion type to be contained in the Seq2Seq model.

Further, in the Seq2Seq model, a copying mechanism is adopted to improve the generation probability of emotion words in the reply generation process, and the specific process is as follows: dividing words obtained from all input semantics into emotion words and non-emotion words, converting emotion words into emotion word vectors, and combining all emotion word vectors with the current implicit state s of a decoder _t Interaction is carried out, the generation probability of the emotion words is obtained, and then the generation probability is added with the additionally increased probability generated by the copying mechanism, so that the generation probability of the emotion words in the reply generation process is improved, and the generation probability of the decoder is expressed as follows:

p(y _i |s _i )＝softmax(p _ori (y _i |s _i )+p _copy (y _i |s _i ，E))

p _copy (y _i |s _i ，E)＝softmax(EW _e s _i )

wherein E is the word vector of all emotion words; w (W) _e Is a learned parameter matrix; y is _i Generating word index numbers of words at the moment i; s is(s) _i Is the intermediate state vector of the decoder at time i; p is p _ori S for the original Decoder _i The word y under the condition _i Generating a probability; p is p _copy For additionally adding s _i Emotion word y under the condition _i Copy probability of (1), when y _i Not the emotion word p _copy Is 0.

Further, the scoring mechanism of the reordering module comprises an emotion consistency score and a semantic consistency score,

the emotion consistency score constructs different emotion dictionaries for different emotion types, and each emotion dictionary gives out a corresponding emotion word under the emotion category and emotion scores corresponding to the emotion words; the emotion consistency score is calculated as follows:

wherein M is the number of emotion words M, E (y) and E _m Representing the emotion scores of candidate reply y and emotion word m, respectively, index (m-1, m) representing the distance of the last emotion word to the current emotion word,representing the adverb y within the distance _j Weight fraction, w _m Representing the weight score of the emotion word m in the emotion dictionary; gamma ray _m Indicating whether the emotion word m is in the corresponding emotion dictionary, and if so, setting 1 in the emotion dictionary of the corresponding emotion category, indicating that the emotion word m has positive contribution to the expression of the corresponding emotion category; otherwise, the method is-1, which indicates that the method has negative contribution to the expression of the corresponding emotion type;

the formula for the semantic consistency score is as follows:

T(y)＝Count(x，y)

wherein Count (·) is the number of identical terms of two sentences.

Further, the reordering module gathers according to the emotion consistency score E (y) and the semantic consistency score T (y) to obtain a total score, wherein the total score is as follows:

Φ(y)＝λ·E(y)+(1-λ)·T(y)

where Φ (y) is the total score used for ranking, λ is the weight value that adjusts both scores.

The application also discloses an emotion dialogue generation method, which comprises the following steps: s1, generating basic replies with correct semantics by detecting entity words of input sentences; s2, establishing emotion models corresponding to various emotions through hierarchical training, and fine-tuning basic replies based on the emotion models to obtain replies containing the emotions; s3, training an emotion model by taking the emotion type as input, and replying according to emotion output by the emotion model; s4, receiving the replies output in the steps S1-S3, scoring the output replies, and rearranging according to the scores, wherein the reply with the highest score is the final emotion reply.

Due to the adoption of the technical scheme, the application has the following advantages:

(1) In the man-machine conversation process, the machine can generate conversation replies meeting specific emotions, so that the replies meet grammar smoothness and semantic consistency and emotion consistency, the quality of machine generated replies can be improved, and the user experience in man-machine interaction can be improved.

(2) The generation dialogue under different emotions is considered, namely 5 emotions of like, sadness, anger, aversion and happiness are specifically considered, a basic reply generation module, a multi-model emotion reply generation module and a single-model emotion reply generation module are generated, a plurality of alternative replies are obtained, and the best reply is selected through a reordering module, so that the reply is obtained.

(3) The end-to-end model for directly generating various specified emotions is provided, and emotion factors are introduced to enable the model to generate sentences towards the specified emotion directions, and meanwhile generation of emotion words is explicitly improved by using a copying mechanism, so that the emotion expressed by the sentences is richer.

(4) Aiming at the emotion dialogue problem, a way for measuring the emotion score of sentences is designed. The method can be combined with other sentence dimension characteristics to reorder the results of different emotion dialogue models.

Drawings

FIG. 1 is a diagram illustrating the structure of an emotion dialogue generation system according to an embodiment of the present application;

FIG. 2 is a logic diagram illustrating a method for generating a multi-modal emotion response generation module according to an embodiment of the present application.

Detailed Description

The present application will be described in detail with reference to specific examples thereof in order to better understand the technical direction of the present application by those skilled in the art. It should be understood, however, that the detailed description is presented only to provide a better understanding of the application, and should not be taken to limit the application. In the description of the present application, it is to be understood that the terminology used is for the purpose of description only and is not to be interpreted as indicating or implying relative importance.

Example 1

The embodiment discloses an emotion dialogue generation system, as shown in fig. 1, including: the emotion dialogue generation module and the reordering module;

the emotion dialogue generation module comprises a basic reply generation module which generates a basic reply with correct semantics by detecting entity words of an input sentence; the multi-model emotion response generation module is used for establishing emotion models corresponding to various emotions through hierarchical training, and performing fine adjustment on basic response based on the emotion models to obtain emotion response; the single model emotion response generation module takes emotion types as input training emotion models and outputs emotion responses according to the emotion models;

the reordering module receives replies output by three sub-modules in the dialogue generating module, scores the replies output, reorders the replies output by each module according to the score, and the reply with the highest score is the final emotion reply.

In the embodiment, the system can generate dialogue replies meeting specific emotion in the man-machine dialogue process, so that the replies meet grammar smoothness and semantic consistency and emotion consistency, the quality of machine generated replies can be improved, and the user experience in man-machine interaction can be improved.

The emotion dialogue generation process in this embodiment includes: converting user's voice, video or input text into machine recognizable text x= (X) ₁ ，x ₂ ，...，x _n ). Wherein x is ₁ To x _n Are all constituent elements in text and may represent paragraphs, sentences, words, etc. Selecting emotion categories for the text by an emotion dialogue generation module, and generating emotion replies Y= (Y) conforming to the emotion categories based on the selected emotion categories ₁ ，y ₂ ，...，y _m ). The emotion categories include { like, sad, anger, aversion, happiness, and others }. The generated emotion response Y must be consistent with the emotion of the selected emotion category, and must also be grammatically smooth and semantically consistent.

The emotion dialogue generation module comprises a basic reply generation module, a multi-model emotion reply generation module and a single-model emotion reply generation module.

The basic reply generation module identifies entity words with practical meaning in the input sentence based on the rule generation model, implants a manually written reply template in the rule generation model, determines the reply template according to the entity words, and determines the basic reply with correct semantics according to the reply module. Because the reply template is a manually written template, the basic reply formed finally can better conform to grammar smoothness, semantic consistency and emotion consistency. In this embodiment, the RUCNLP tool is used to extract the entity words in the sentence. In the embodiment, whether the entity word is detected is adopted as a trigger point of the rule generation model, and the output result of the rule generation model is directly used as a reply after triggering, so that the calculation amount of the model can be reduced in actual use.

The multi-model emotion reply generation module and the single-model emotion reply generation model generate models with emotion types through a Seq2Seq model, wherein the Seq2Seq model comprises an Encoder Encoder and a Decoder Decoder, and the Encoder is used for converting an input sentence X into a dense vector H= (H) with an intermediate state ₁ ，h ₂ ，...，h _n ) The decoder is arranged to decode this intermediate state vector H into an output sentence Y of the emotion model. The above process is typically implemented by a long and short time dependent memory unit (LSTM) or a gate loop unit (GRU). The present embodiment is described taking a gate cycle unit as an example. The gating circulation unit is controlled by the update gate and the reset gate, and the calculation process is as follows:

z＝σ(W _z x _t +U _z h _t-1 )

r＝σ(W _r x _t +U _r h _t-1 )

wherein z is the update gate output; r is the reset gate output result; s is the input cell state vector; tanh (·) and σ (·) are activation functions.Representing a dot product of the vectors; w (W) _z ，W _r ，W _s ，U _z ，U _r ，U _s Respectively parameter matrixes under different gates for inputting vector x at time t _t And the intermediate state h at the last moment _t-1 Mapped to the same semantic space. Word vectors are trained with models by random initialization.

The encoder and decoder calculation process can be expressed as:

h _t ＝GRU _encoder (x _t ，h _t-1 )

s _t ＝GRU _decoder (y _t-1 ，s _t-1 )

p(y _t |s _t )＝softmax(W _o s _t )

wherein p (y) _t |s _t ) To generate probability of word vector at decoder time t, word with maximum probability of word vector is used as currently generated word y _t 。h _t ，s _t The intermediate implicit states of the encoder and decoder, respectively, at time t. W (W) _o Is to output decoder state s during output _t A parameter matrix mapped to the vocabulary space.

Since only the last output h of the encoder is used in the encoding process _n As a representation of the input sentence; meanwhile, in the decoding process, the output value of the decoder at each time t is only dependent on the state s at the last time _t-1 And the word vector y of the last generated word _t Other information of the input sentence is not fully utilized nor fully expressed. An attention mechanism needs to be introduced to enrich the input information of the decoding process. After introducing the attention mechanism, the decoder decodes using the following formula:

s _i ＝GRU _decoder (y _i-1 ，s _i-1 ，c _i )

As shown in fig. 2, the multi-model emotion response generation module includes a plurality of Seq2Seq models corresponding to emotion types, and the generation process of the Seq2Seq models is as follows: firstly, generating a general model obtained by training a semantic set, and obtaining different generation models by performing fine-tuning on different subdivision data sets; and secondly, training by using data sets with different emotion polarities on the basis of a general model to obtain models corresponding to different emotion polarities. Namely, the general model is divided into a positive emotion model and a negative emotion model; and finally, training by using data sets of different emotion types on the basis of the positive emotion model and the negative emotion model respectively to obtain models corresponding to different emotion types. Namely, the forward emotion model is divided into a model corresponding to happiness and a model corresponding to liking; negative emotion models are classified into a model corresponding to aversion, a model corresponding to sadness, and a model corresponding to anger.

Compared with the previous model, the model in the embodiment effectively improves the accuracy of the reply of emotion categories (such as anger, aversion and the like) with low quality of the data set. Compared with the returns of positive emotions such as happiness, liking, etc., the returns such as anger or aversion with negative emotion have not only a smaller data amount but also a subtle emotion expression. For computers, it is easier to learn to generate replies with negative emotions than to learn to generate replies with emotions such as anger or aversion. Because the negative emotion is more data and emotion expression is more sufficient. For this reason, the model in this embodiment achieves good performance in the generation of a reply to negative emotion, especially in anger and aversive emotion types.

Traditional Seq2Seq models tend to generate generalized and generic replies, although sentences have some fluencyHowever, the response cannot be provided with a specific emotion in the emotion dialogue. Therefore, a single model emotion response generation module is introduced in the embodiment. The single model emotion reply generation module only comprises a Seq2Seq model, and the emotion type i is converted into an emotion vector e in the Seq2Seq model _i And will emotion vector e _i The method and the device are added into the decoding process of the decoder, so that each generated reply of the decoder can have emotion type information, and the replies are developed towards a direction with a certain emotion in the generation process. The emotion vector is randomly initialized and is continuously updated in the learning process, so that the Seq2Seq model contains emotion type information. The corresponding decoder calculation process is as follows: s is(s) _t ＝GRU _decoder (y _t-1 ，s _t-1 ，c _t ，e _i )。

After the emotion vector is added, only the model can perceive emotion type information when generating a reply, but emotion expression is often reflected on specific emotion words, so that the generation probability of the emotion words in the generation process is improved by adopting a copying mechanism. The specific process is as follows: dividing words obtained from all input semantics into emotion words and non-emotion words, converting emotion words into emotion word vectors, and combining all emotion word vectors with the current implicit state s of a decoder _t Interaction is carried out, the generation probability of the emotion words is obtained, and then the generation probability is added with the additionally increased probability generated by the copying mechanism, so that the generation probability of the emotion words in the reply generation process is improved, and the generation probability of the decoder is expressed as follows:

p(y _t |s _t )＝softmax(p _ori (y _t |s _t )+p _copy (y _t |s _t ，E))

p _copy (y _t |s _t ，E)＝softmax(EW _e s _t )

The model generated by the single-model emotion response generation module can tend to generate a response with emotion under the input of a given text and emotion.

And establishing a reply candidate database for all replies generated by the emotion dialogue generation module, including the basic replies and the replies with emotion. In order to select the optimal reply from the reply candidate database for output, the embodiment introduces a reordering module.

The scoring mechanism of the reordering module includes an emotion consistency score and a semantic consistency score.

The specific determination process for emotion consistency scores is as follows:

based on the emotion vocabulary ontology library released by university of Connect and the result of chi-square clustering according to different emotion text data, different emotion dictionaries are constructed for different emotion types. Each emotion dictionary gives emotion words under the emotion category and emotion scores corresponding to the emotion words. The score of the emotion word combines the weight given by the emotion vocabulary ontology library and the word frequency in the data set, and reflects the importance degree of the emotion word for expressing the emotion. In general, explicit emotion words have a higher score than implicit emotion words. For example, in the emotion dictionary of the happy emotion category, "happy" has a higher score than "winning". According to the emotion dictionary, emotion scores corresponding to each emotion word can be obtained. The emotion score of a sentence is the sum of emotion scores of emotion words appearing in the sentence.

Furthermore, the expression of emotion may be enhanced, reduced or reversed by the terms of degree adverbs, such as "very", "dotted", "no". To reflect the impact of these adverbs on emotional expression, these degree adverbs are classified by degree level and given different weights in this embodiment. Words of degree such as "very" which enhance emotion expression, have a weight greater than 1, and increase emotion score after multiplication; words with reduced emotion expression such as 'dotted', wherein the weight is less than 1, and the emotion score is reduced after multiplication; the words with the degrees of emotion expression reversed such as 'no' have the weight of-1, and the emotion scores are reversed after multiplication; if the result is double negative, the result is-1 by-1, which is equivalent to that the emotion score is not changed, namely, the double negative table is affirmed.

In summary, emotion consistency scores are calculated according to the following formula:

wherein M is the number of emotion words M, E (y) and E _m Representing the emotion scores of candidate reply y and emotion word m, respectively, index (m-1, m) representing the distance of the last emotion word to the current emotion word,representing the adverb y within the distance _j Weight fraction, w _m Representing the weight score of the emotion word m in the emotion dictionary; gamma ray _m Indicating whether the emotion word m is in the corresponding emotion dictionary, if so, setting to 1 in the emotion dictionary of the corresponding emotion category, indicating a positive contribution to the expression of the corresponding emotion category, such as "happiness" appearing in the happy category; otherwise-1, the explanation has a negative contribution to the corresponding emotion class expression, such as "sadness" appearing in the happy class.

Semantic consistency score

The present embodiment takes term similarity as a score for semantic consistency, encouraging the model to generate more replies with consistent information. In this embodiment, the same number of terms of two sentences is selected as the semantic consistency score, and the formula of the semantic consistency score is as follows:

T(y)＝Count(x，y)

wherein Count (·) is the number of identical terms of two sentences.

The reordering module gathers according to the emotion consistency score E (y) and the semantic consistency score T (y) to obtain a total score, wherein the total score is as follows:

Φ(y)＝λ·E(y)+(1-λ)·T(y)

Example two

Based on the same inventive concept, the embodiment discloses an emotion dialogue generation method, which comprises the following steps:

s1, generating basic replies with correct semantics by detecting entity words of input sentences;

s2, establishing emotion models corresponding to various emotions through hierarchical training, and fine-tuning basic replies based on the emotion models to obtain replies containing the emotions;

s3, training an emotion model by taking the emotion type as input, and replying according to emotion output by the emotion model;

s4, receiving the replies output in the steps S1-S3, scoring the output replies, and rearranging according to the scores, wherein the reply with the highest score is the final emotion reply.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily appreciate variations or alternatives within the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. An emotion conversation generation system, comprising: an emotion dialogue generation module and a reordering module,

the emotion dialogue generation module comprises a basic reply generation module which generates a basic reply with correct semantics by detecting entity words of an input sentence; the multi-model emotion response generation module is used for establishing emotion models corresponding to various emotions through hierarchical training, and performing fine adjustment on the basic response based on the emotion models to obtain emotion responses; the single model emotion response generation module takes emotion types as input training emotion models and outputs emotion responses according to the emotion models;

the reordering module receives replies output by three sub-modules in the dialogue generating module, scores the replies output, reorders the replies output by each module according to the score, and the reply with the highest score is the final emotion reply;

the multi-model emotion response generation module comprises a plurality of Seq2Seq models corresponding to emotion categories, wherein the single-model emotion response generation module only comprises one Seq2Seq model, and in the Seq2Seq model, a copying mechanism is adopted to improve the generation probability of emotion words in the response generation process, and the specific process is as follows: dividing words obtained from all input semantics into emotion words and non-emotion words, converting the emotion words into emotion word vectors, and combining all emotion word vectors with the current implicit state s of a decoder _t Interaction is carried out, the generation probability of the emotion words is obtained, and then the generation probability is added with the additionally increased probability generated by the copying mechanism, so that the generation probability of the emotion words in the reply generation process is improved, and the generation probability of the decoder is expressed as the following formula:

p(y _t |s _t )＝softmax(p _ori (y _t |s _t )+p _copy (y _t |s _t ,E))

p _copy (y _t |s _t ,E)＝softmax(EW _e s _t )

2. The emotion conversation generation system of claim 1, wherein the basic reply generation module identifies entity words having actual meanings in an input sentence based on a rule generation model, and implants a manually composed reply template in the rule generation model, determines the reply template from the entity words, and determines a semantically correct basic reply from the reply module.

3. The emotion conversation generation system of claim 1 or 2, wherein said multi-model emotion reply generation module and single-model emotion reply generation model each generate a model with emotion categories through a Seq2Seq model, said Seq2Seq model including an Encoder and a Decoder, said Encoder for converting input statements into a dense vector h= (H ₁ ,h ₂ ,…,h _n ) The decoder is arranged to decode this intermediate state vector H into an output sentence Y of the emotion model.

4. The emotion dialog generation system of claim 3, wherein an attention mechanism is introduced in the Seq2Seq model for enriching input information of a decoder, and the attention mechanism-introduced decoder decodes using the following formula:

s _i ＝GRU _decoder (y _i-1 ,s _i-1 ,c _i )

where i is the different time instances of the decoderThe method comprises the steps of carrying out a first treatment on the surface of the j is the different instants of the encoder; s is(s) _i Is the implicit state of the decoder at each instant i in the decoding process; h is a _j Is the vector representation of the intermediate state dense vector H at time j in the encoder encoding process; e, e _ij Is the decoder hidden state s at the last instant _i-1 And intermediate state h with encoder at different instants j _j Calculated attention importance, where W _a Is a learned parameter matrix; alpha _ij The weighted weights which are obtained by normalizing the importance by an attention mechanism and are distributed to the intermediate vectors of different times of the encoder; n is the length of the input; c _i The method is characterized in that the method comprises the steps of weighting and summing all intermediate states of an encoder through attention mechanism weights to obtain vector representation of context information through calculation; y is _i Is a word vector that generates words at time i.

5. The emotion conversation generation system of claim 3, wherein said multimodal emotion response generation module includes a plurality of Seq2Seq models corresponding to emotion categories, said Seq2Seq model being generated by: firstly, generating a general model obtained by training all corpus; secondly, dividing all corpus into positive emotion corpus and negative emotion corpus according to emotion types, respectively trimming a general model, and dividing the general model into a positive emotion model and a negative emotion model; finally, the forward emotion corpus is divided into a happy emotion corpus and a favorite emotion corpus, and the forward emotion model is finely tuned respectively and is divided into a model corresponding to happy and a model corresponding to favorite; the negative emotion corpus is divided into aversion, sadness and anger emotion corpus, and the negative emotion models are finely tuned respectively, and are divided into a model corresponding to aversion, a model corresponding to sadness and a model corresponding to anger.

6. The emotion conversation generation system of claim 4 wherein only one Seq2Seq model is included in said single model emotion reply generation module, said Seq2Seq model converting emotion category i into emotion vector e _i And the emotion vector e _i Added to the decoding process of a decoder to enable the SeqThe 2Seq model contains emotion type information.

7. The emotional dialog generation system of claim 1 or 2, wherein the scoring mechanism of the reordering module comprises an emotion consistency score and a semantic consistency score,

the emotion consistency score builds different emotion dictionaries for different emotion types, and each emotion dictionary gives out a corresponding emotion word under the emotion category and emotion scores corresponding to the emotion words; the emotion consistency score is calculated as follows:

wherein M is the number of emotion words M, E (y) and E _m Representing the emotion scores of candidate reply y and emotion word m, respectively, index (m-1, m) representing the distance of the last emotion word to the current emotion word,representing the adverb y within said distance _j Weight fraction, w _m Representing the weight score of the emotion word m in the emotion dictionary; gamma ray _m Indicating whether the emotion word m is in the corresponding emotion dictionary, and if so, setting 1 in the emotion dictionary of the corresponding emotion category, indicating that the emotion word m has positive contribution to the expression of the corresponding emotion category; otherwise, the method is-1, which indicates that the method has negative contribution to the expression of the corresponding emotion type;

the formula of the semantic consistency score is as follows:

T(y)＝Count(x,y)

wherein Count (·) is the number of identical terms of two sentences.

8. The emotion conversation generation system of claim 7 wherein the reordering module sums up a total score from the emotion consistency score E (y) and semantic consistency score T (y), the total score being:

Φ(y)＝λ·E(y)+(1-λ)·T(y)

9. An emotion dialogue generation method, characterized by being used for an emotion dialogue generation system according to any one of claims 1 to 8, comprising the steps of:

s2, establishing emotion models corresponding to various emotions through hierarchical training, and fine-tuning the basic reply based on the emotion models to obtain a reply containing emotion;

s3, training an emotion model by taking emotion types as input, and replying according to emotion output by the emotion model;