CN116680369A

CN116680369A - Co-emotion dialogue generation method and system

Info

Publication number: CN116680369A
Application number: CN202310420171.9A
Authority: CN
Inventors: 刘智; 崔晨阳; 戴志诚; 刘三女牙
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2023-04-13
Filing date: 2023-04-13
Publication date: 2023-09-01
Anticipated expiration: 2043-04-13
Also published as: CN116680369B

Abstract

The invention provides a method and a system for generating a common-case dialogue, which belong to the field of natural language processing and comprise the following steps: s1, acquiring a dialogue data set; s2, carrying out emotion marking on the data set and acquiring common sense knowledge from a common sense knowledge base; s3, coding the marked emotion and the obtained common sense knowledge to obtain an emotion context vector and a common sense context vector; s4, combining the emotion context vector and the common sense context vector into a dual-feature context vector; s5, estimating the replied emotion state by using the historical dialogue sequence, and converting the state into a mixed emotion matrix; s6, inputting the dual-feature context vector and the mixed emotion matrix to a multi-source decoder to generate a reply; according to the invention, emotion perception and co-emotion expression are enhanced through emotion and common sense knowledge combined modeling, and generated replies are enabled to be identical with real scenes in emotion through reasoning of the recovered emotion states.

Description

Co-emotion dialogue generation method and system

Technical Field

The invention belongs to the field of natural language processing, and particularly relates to a method and a system for generating a co-emotion dialogue.

Background

Cosmopathy is a very important skill in interpersonal social contact, and refers to the ability to emotionally understand and experience the feelings of others. Researchers put forward the co-emotion dialogue generation task, and bring the co-emotion dialogue generation technology into the dialogue system to understand the feeling and the demand of the user, so that the satisfaction degree and the trust degree of the user are improved, and the dialogue system is more intelligent and personalized.

Thanks to the development of dialog generation technology, a number of co-emotion dialog generation methods are proposed. Based on the type of modeling, the co-emotion conversation method is divided into two types: 1. focusing on modeling emotion to improve co-emotion expression, lin et al, for example, designed multiple decoders to react to different emotions and combine them to generate a final co-emotion reply, li et al utilized coarse-grained emotion labels and fine-grained emotion words in emotion dictionary NRC to understand user subtle emotion differences. 2. Focusing on modeling common sense knowledge to enhance emotion perception, for example sabours et al use common sense knowledge to perceive the user's potential feelings and to make a common sense reply. Despite the advances made in these co-located conversation methods, two problems remain. 1. They model only emotional or common sense knowledge, which often results in insufficient emotional perception and co-emotional expression of the model. 2. The generated replies are emotionally inconsistent with the real scene, which results in an inability to meet the user's potential psychological needs.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a co-emotion conversation generation method and a system, which aim to solve the problems that the prior co-emotion conversation generation method is insufficient in emotion perception and co-emotion expression capability and generated replies are inconsistent with real scenes in emotion.

To achieve the above object, in a first aspect, the present invention provides a method for generating a consensus dialogue, including the steps of:

obtaining a dialog data set comprising a plurality of dialog records, each dialog record comprising a context utterance and a reply utterance;

inputting the last sentence in the context words in each dialogue record and the preset common sense knowledge category into a common sense library to obtain a common sense reasoning set of each dialogue record; and splicing the common sense reasoning sets recorded by each dialogue under each common sense knowledge category to obtain a corresponding common sense sequence, encoding the common sense sequence, and performing attention mechanism calculation to obtain a common sense vector which is related to the context under each common sense category;

each utterance in each dialogue record is subjected to emotion marking, and a context utterance emotion label and a reply utterance emotion label corresponding to each dialogue record are obtained; splicing the context utterances of each dialogue record of the emotion marking to obtain a context sequence, obtaining a context vector of each dialogue record based on a word embedded vector and a dialogue state embedded vector corresponding to the context sequence, and obtaining an emotion context vector of each dialogue record based on the word embedded vector, the dialogue state embedded vector and the emotion embedded vector corresponding to the context sequence;

The context vector of each dialogue record is respectively spliced with the common sense vector which is related to the context under each common sense category to obtain the common sense context vector of each dialogue record under each common sense category; the emotion context vector of each dialogue record and the common sense context vectors under all common sense categories are spliced to obtain vectors fusing emotion and common sense, and an activating function is adopted to highlight important features of the fused vectors to obtain context vectors fusing double features;

sequentially determining the emotion states of the dialogue at each moment based on a historical dialogue record and a gate control circulation unit, inputting the emotion states at the last moment to a multi-layer perceptron with an activation function to estimate the emotion states of the reply words, and further calculating the probability that various emotions are expressed through a normalization function to obtain a corresponding mixed emotion matrix based on the emotion probabilities and the corresponding emotion vector representation;

a reply utterance of the dialog record is generated based on the context vector of the fused dual features and the mixed emotion matrix.

In one possible implementation manner, emotion marking is performed on each utterance in each dialogue record, so as to obtain a context utterance emotion tag and a reply utterance emotion tag corresponding to each dialogue record, which are specifically as follows:

Automatically labeling emotion of each utterance and emotion of the reply utterance in the context by using an emotion classification pre-training model to obtain a context emotion tag sequence E= [ E ] ₁ ,e ₂ ,...,e _k-1 ]And a reply utterance emotion tag e _k E is one of multiple classes of emotion;

inputting the last sentence in the context words in each dialogue record and the preset common sense knowledge category into a common sense library to obtain a common sense reasoning set of each dialogue record, wherein the common sense reasoning set comprises the following specific steps:

the last utterance u in the context _k And a preset common sense type r is input into a common sense library COMET to obtain a common sense set S _r ：

S _r ＝COMET(r,u _k )

The preset common sense type r comprises a plurality of common sense categories, and the common sense set corresponding to each common sense category comprises five common sense reasoning pieces

In one possible implementation, the context vector and emotion context vector for each dialog record are determined by:

k-1 utterances in dialog context [ u ] ₁ ,u ₂ ,...,u _k-1 ]Splicing to obtain context sequence

Embedding words corresponding to the context sequence into vector E _W (C) Dialogue state embedding vector E _D (C) Speech emotion embedding vector E _E (C) Added and input to emotion encoder Enc _emo Obtaining emotion context vector H _emo-ctx ：

H _emo-ctx ＝Enc _emo (E _W (C)+E _E (C)+E _D (C))

Splicing common sense reasoning in common sense set to obtain common sense sequence Then input to the common sense encoder Enc _cs Obtain common sense vector->Finally, calculating the common sense vector of the context through the attention mechanism

Wherein E is _W In order to be a word embedding layer,is u _k The corresponding context vector, e is the score vector, α is the alignment vector, W _a And v is a trainable parameter in the full connectivity layer, L _r Is common sense sequence CS _r Length of alpha _i The score corresponding to the i-th word in the score vector is obtained;

embedding words corresponding to the context sequence into vector E _W (C) And dialog state embedding vector E _D (C) Added and input to contextEncoder Enc _ctx Obtaining a context vector H _ctx ：

H _ctx ＝Enc _ctx (E _W (C)+E _D (C))

Common sense vector to be contextually relevantAnd context vector H _ctx Spliced and then input to a refining encoder Enc _re f, obtaining the common sense context vector +.>

Wherein [; and is a splicing operation.

In one possible implementation manner, the emotion context vector of each dialogue record and the common sense context vectors under all common sense categories are spliced to obtain vectors fusing emotion and common sense, and an activation function is adopted to highlight important features of the fused vectors, so as to obtain context vectors fusing dual features, which are specifically as follows:

splicing the common sense context vectors under all common sense categories to obtain a common sense context vector H _cs-ctx ；

Splicing the emotion context vector and the general knowledge context vector to obtain a vector fusing emotion and general knowledge

Applying Sigmoid function to H _emo-cs-ctx And multiplied by H _emo-cs-ctx Then input to a multi-layer perceptron with ReLU activation function to obtain context vector integrating dual features

Wherein σ () is a Sigmoid function, and by which the corresponding elements are multiplied, and MLP is a multi-layer perceptron.

In one possible implementation, the mixed emotion matrix is obtained by:

for each instant, the start marker [ CLS ]]Before adding the corresponding words, then inputting the corresponding word embedded vectors into a context encoder Enc _ctx Coding to obtain a feature vector x _t ：

x _t ＝Enc _ctx (E _W (u _t ))[0]

Wherein x is _t Hidden layer [ CLS ] from context encoder]A location;

feature vector x of the utterance at the current moment _t Emotional state h with the last moment utterance _t-1 Input into a gate control circulation unit GRU together to calculate the emotion state h at the current moment _t And then passed on to the next GRU:

h _t ＝GRU(h _t-1 ,x _t )

repeating the above process until the emotion state h corresponding to the last moment _k-1 The emotion state at the last moment is calculated to contain emotion information at the moment and historical dialogue emotion transfer information;

the emotion state h corresponding to the last moment _k-1 Inputting to a multi-layer perceptron with a Tanh activation function, and estimating the recovered emotion state h _k ：

h _k ＝MLP(h _k-1 )

Selecting the largest multiple values from the recovered emotion states, inputting the values into a Softmax layer to obtain the expressed probability of the multiple emotion types, and multiplying the calculated probability with the corresponding emotion vector to obtain a mixed emotion matrix

In one possible implementation, the reply utterance is generated based on the context vector and the mixed emotion matrix of the fused dual features, specifically:

dual context vectorAs the checked vector and the content vector, the checked vector and the content vector are input into a cross attention module to obtain a vector A _ec ：

Wherein O is a multi-source decoder concealment vector;

will mix the return emotion matrixAs the checked vector and the content vector, the checked vector and the content vector are input into another cross attention module to obtain a vector A _er ：

Vector A _ec Sum vector A _er Splicing, and inputting the characteristics from the emotion source and the common sense source into a full-connection layer to iteratively update the generated reply words.

In a second aspect, the present invention provides a co-emotion dialog generation system, comprising:

a data set acquisition module for acquiring a dialog data set comprising a plurality of dialog records, each dialog record comprising a context utterance and a reply utterance;

The common sense reasoning module is used for inputting the last sentence in the context words in each dialogue record and the preset common sense knowledge category into the common sense library to obtain a common sense reasoning set of each dialogue record; and splicing the common sense reasoning sets recorded by each dialogue under each common sense knowledge category to obtain a corresponding common sense sequence, encoding the common sense sequence, and performing attention mechanism calculation to obtain a common sense vector which is related to the context under each common sense category;

the emotion marking splicing module is used for carrying out emotion marking on each utterance in each dialogue record to obtain a context utterance emotion label and a reply utterance emotion label corresponding to each dialogue record; splicing the context utterances of each dialogue record of the emotion marking to obtain a context sequence, obtaining a context vector of each dialogue record based on a word embedded vector and a dialogue state embedded vector corresponding to the context sequence, and obtaining an emotion context vector of each dialogue record based on the word embedded vector, the dialogue state embedded vector and the emotion embedded vector corresponding to the context sequence;

the dual feature fusion module is used for respectively splicing the context vector of each dialogue record with the common sense vector related to the context under each common sense category to obtain the common sense context vector of each dialogue record under each common sense category; the emotion context vector of each dialogue record and the common sense context vectors under all common sense categories are spliced to obtain vectors fusing emotion and common sense, and an activating function is adopted to highlight important features of the fused vectors to obtain context vectors fusing double features;

The emotion matrix prediction module is used for sequentially determining the emotion states of the dialogue at each moment based on the historical dialogue record and the gate control circulation unit, inputting the emotion states at the last moment into the multi-layer perceptron with the activation function so as to estimate the emotion states of the reply utterances, and further calculating the probability that various emotions are expressed through the normalization function so as to obtain a corresponding mixed emotion matrix based on the emotion probabilities and the corresponding emotion vector representation;

and the co-emotion speech reply module is used for generating a reply speech of the dialogue record based on the context vector fusing the dual features and the mixed emotion matrix.

In one possible implementation manner, the emotion matrix prediction module obtains a mixed emotion matrix by:

x _t ＝Enc _ctx (E _W (u _t ))[0]

Wherein x is _t Hidden layer [ CLS ] from context encoder]A location;

h _t ＝GRU(h _t-1 ,x _t )

h _k ＝MLP(h _k-1 )

In one possible implementation, the co-emotion speech recovery module uses dual context vectorsAs the checked vector and the content vector, the checked vector and the content vector are input into a cross attention module to obtain a vectorWherein O is a multi-source decoder concealment vector; will mix the return emotion matrixAs a checked vector and a content vectorInto another cross-attention module to obtain vectorVector A _ec Sum vector A _er Splicing, and inputting the characteristics from the emotion source and the common sense source into a full-connection layer to iteratively update the generated reply words.

In a third aspect, the present invention provides an electronic device comprising: at least one memory for storing a program; at least one processor for executing a memory-stored program, which when executed is adapted to carry out the method described in the first aspect or any one of the possible implementations of the first aspect.

In a fourth aspect, the present invention provides a computer readable storage medium storing a computer program which, when run on a processor, causes the processor to perform the method described in the first aspect or any one of the possible implementations of the first aspect.

In a fifth aspect, the invention provides a computer program product which, when run on a processor, causes the processor to perform the method described in the first aspect or any one of the possible implementations of the first aspect.

In general, the above technical solutions conceived by the present invention have the following beneficial effects compared with the prior art:

the invention provides a co-emotion dialogue generation method, which considers the defects of emotion perception and co-emotion expression caused by modeling emotion or common sense knowledge only, adopts a mode of combining modeling emotion and common sense knowledge to accurately understand the feeling and situation of a user, and enables the generated reply to have more co-emotion expression. Further, the invention estimates the emotion state of the reply by learning the emotion relation between the utterances in the historical dialogue sequence, and the state provides important emotion information for the reply generation process, thereby improving the consistency of the generated reply and the real scene in emotion and further meeting the psychological needs of the user.

Drawings

FIG. 1 is a schematic flow diagram of a co-emotion dialogue generation method for fused emotion and common sense knowledge joint modeling and time emotion reasoning provided by the embodiment of the invention;

FIG. 2 is a schematic diagram of a co-emotion dialogue generation system model for integrated emotion and common sense knowledge joint modeling and time emotion reasoning provided by an embodiment of the invention;

FIG. 3 is a schematic diagram of a confusion degree (Perplexity) change of a co-emotion dialogue generation system model integrating emotion and common sense knowledge joint modeling and time emotion reasoning provided by an embodiment of the present invention;

fig. 4 is a schematic diagram of a reply generated by a co-emotion dialogue generation system model integrating emotion and common sense knowledge joint modeling and timing emotion reasoning according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The invention provides a method for generating a co-emotion conversation, which comprises the following steps:

s101, acquiring a dialogue data set, wherein the dialogue data set comprises a plurality of dialogue records, and each dialogue record comprises a context utterance and a reply utterance;

S102, inputting the last sentence in the context words in each dialogue record and the preset common sense knowledge category into a common sense library to obtain a common sense reasoning set of each dialogue record; and splicing the common sense reasoning sets recorded by each dialogue under each common sense knowledge category to obtain a corresponding common sense sequence, encoding the common sense sequence, and performing attention mechanism calculation to obtain a common sense vector which is related to the context under each common sense category;

s103, carrying out emotion marking on each utterance in each dialogue record to obtain a context utterance emotion label and a reply utterance emotion label corresponding to each dialogue record; splicing the context utterances of each dialogue record of the emotion marking to obtain a context sequence, obtaining a context vector of each dialogue record based on a word embedded vector and a dialogue state embedded vector corresponding to the context sequence, and obtaining an emotion context vector of each dialogue record based on the word embedded vector, the dialogue state embedded vector and the emotion embedded vector corresponding to the context sequence;

s104, the context vector of each dialogue record is respectively spliced with the common sense vector related to the context under each common sense category to obtain the common sense context vector of each dialogue record under each common sense category; the emotion context vector of each dialogue record and the common sense context vectors under all common sense categories are spliced to obtain vectors fusing emotion and common sense, and an activating function is adopted to highlight important features of the fused vectors to obtain context vectors fusing double features;

S105, sequentially determining the emotion states of the dialogue at each moment based on a historical dialogue record and a door control circulation unit, inputting the emotion states at the last moment to a multi-layer perceptron with an activation function to estimate the emotion states of the reply utterances, and further calculating the probability that various emotions are expressed through a normalization function to obtain a corresponding mixed emotion matrix based on the emotion probabilities and the corresponding emotion vector representation;

and S106, generating a reply utterance of the dialogue record based on the context vector fusing the dual features and the mixed emotion matrix.

It should be noted that, in the training phase: based on the provided context, and the replies, the model of the invention is made to learn how to generate replies based on the context. When the model is trained and put into use, the corresponding reply can be automatically generated according to the context.

Specifically, the invention discloses a co-emotion dialogue generation method and system integrating emotion and common sense knowledge combined modeling and timing emotion reasoning. Relates to the field of natural language processing. The method comprises the following steps: s1, acquiring a dialogue data set; s2, carrying out emotion marking on the data set and acquiring common sense knowledge from a common sense knowledge base; s3, coding the marked emotion and the obtained common sense knowledge to obtain an emotion context vector and a common sense context vector; s4, combining the emotion context vector and the common sense context vector into a dual-feature context vector; s5, estimating the replied emotion state by using the historical dialogue sequence, and converting the state into a mixed emotion matrix; s6, inputting the dual-feature context vector and the mixed emotion matrix to a multi-source decoder to generate a reply; according to the invention, emotion perception and co-emotion expression are enhanced through emotion and common sense knowledge combined modeling, and generated replies are enabled to be identical with real scenes in emotion through reasoning of the recovered emotion states.

Fig. 1 is a schematic flow chart of a co-emotion dialogue generation method integrating emotion and common sense knowledge combined modeling and time emotion reasoning according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

s1, acquiring a dialogue data set, wherein the dialogue data set comprises a plurality of dialogue records, and each dialogue record comprises a context D= [ u ] ₁ ,u ₂ ,...,u _k-1 ]And a reply utterance u _k Wherein the context includes k-1 utterances;

s2, carrying out emotion marking on the data set and acquiring common sense knowledge from a common sense knowledge base; the step S2 includes the steps of:

s21, automatically labeling the emotion of each utterance and the emotion of the reply utterance in the context by using an emotion classification pre-training model to obtain a context emotion tag sequence E= [ E ] ₁ ,e ₂ ,...,e _k-1 ]And a reply utterance emotion tag e _k E is one of several classes of emotion.

S22, last utterance u in the context _k And common sense type r are input into a common sense library COMET to obtain a common sense set S _r ：

S _r ＝COMET(r,u _k )

Where r is { xWant someone wants, xNeed someone needs, xInt someone wants, xEffect someone is about }, each common sense set contains five common sense inferences

In one specific example: the common sense type r includes the 4 common sense sets, and each common sense set includes five common sense inferences, for example, as follows:

Someone wants to: [ "escape", "fear", "sadness", "injury", "difficulty" ], the following

Someone needs to be [ (ensure that they are not in charge "," ensure that they are safe "," ensure that they are not in charge "," need to be in charge "]

Someone hopes that [ (making a call for help "," sorry them "," counterattack "," sorry "," help "]

Someone will be [ (they get a ticket "," they don't work "," they are injured "," they are safe) ]

S3, coding the marked emotion and the obtained common sense knowledge to obtain an emotion context vector and a common sense context vector; step S3 includes the steps of:

s31, splicing k-1 utterances in the dialogue context to obtain a context sequence

S32, word embedding E corresponding to context sequence _W (C) Dialogue state embedding E _D (C) Emotion embedding E _E (C) Adding and inputting to emotion encoder to obtain emotion context vector H _emo-ctx ：

H _emo-ctx ＝Enc _emo (E _W (C)+E _E (C)+E _D (C))

S33, splicing common sense reasoning in the common sense set to obtain a common sense sequenceThen input to the common sense encoder to obtain the common sense vector +.>Finally, the context-dependent common sense vector is calculated by means of the attention mechanism>To reduce noise introduced by common sense reasoning:

α＝softmax(e)

Wherein the method comprises the steps ofIs u _k Corresponding context vector, W _a And v is a trainable parameter in the full connectivity layer, L _r Is common sense sequence CS _r Is a length of (c).

S34, word embedding E corresponding to context sequence _W (C) And dialogue state embedding E _D (C) Added and input to a context encoder to obtain a context vector H _ctx ：

H _ctx ＝Enc _ctx (E _W (C)+E _D (C))

S35, common sense vector related to the contextAnd context vector H _ctx Splicing, then inputting to a refining encoder to obtain a common sense context vector +.>

Wherein [; and is a splicing operation.

S4, combining the emotion context vector and the common sense context vector into a dual-feature context vector; step S4 includes the steps of:

s41, splicing the emotion context vector and all common sense context vectors to obtain a vector H _emo-cs-ctx ：

S42, sigmoid function is applied to H _emo-cs-ctx And multiplied by H _emo-cs-ctx Then input to a multi-layer perceptron with ReLU activation function to highlight H _emo-cs-ctx Important features in the vector, resulting in a dual feature context vector

Where σ () is a Sigmoid function, and by which the corresponding element is multiplied.

S5, estimating the replied emotion state by using the historical dialogue sequence, and converting the state into a mixed emotion matrix; step S5 includes the steps of:

s51, for each time, the start marker [ CLS ] ]Before adding the corresponding words, then embedding and inputting the corresponding words into a context code to encode, thereby obtaining a feature vector x _t ：

x _t ＝Enc _ctx (E _W (u _t ))[0]

Wherein x is _t Hidden layer [ CLS ] from context encoder]Position.

S52, feature vector x of the speech at the moment _t Emotional state h with the last moment utterance _t-1 Input to the gate control circulation unit GRU together to calculate the emotion state h at the moment _t And then passed on to the next GRU:

h _t ＝GRU(h _t-1 ,x _t )

s53, repeating the above processes until the emotion state h corresponding to the last moment _k-1 The emotion state is calculated to contain emotion information at the moment and historical dialogue emotion transfer information; the emotion state h corresponding to the last moment _k-1 Inputting to a multi-layer perceptron with a Tanh activation function, and estimating the recovered emotion state h _k ：

h _k ＝MLP(h _k-1 )

S54, selecting the largest 7 values from the recovered emotion states, inputting the 7 values into a Softmax layer to obtain the probability of expressing 7 emotion, and multiplying the calculated probability with the corresponding emotion vector to obtain a mixed emotion matrixWherein the emotion vector is derived from emotion embedding.

S6, inputting the dual-feature context vector and the mixed emotion matrix to a multi-source decoder to generate a reply; step S6 includes the steps of:

s61, dual context vector As the checked vector and the content vector, the checked vector and the content vector are input into a cross attention module to obtain a vector A _ec ：

Where O is the multi-source decoder concealment vector.

S62, mixed response emotion matrixAs the checked vector and the content vector, the checked vector and the content vector are input into another cross attention module to obtain a vector A _er ：

S63, vector A _ec Sum vector A _er Spliced and then input to the full connectivity layer to integrate the features of the two sources to generate a reply.

The invention discloses a co-emotion dialogue generation system integrating emotion and common sense knowledge combined modeling and timing emotion reasoning, which is shown in a figure 2, and comprises the following components:

the emotion marking and common sense obtaining module is used for marking emotion of each utterance and obtaining common sense knowledge from the COMET common sense knowledge base;

a common sense acquisition module based on attention, configured to code and acquire common sense knowledge, and then apply an attention mechanism to focus on the context-related common sense knowledge to obtain a context-related common sense vector;

the double-view joint encoder is used for encoding the marked speaking emotion and the context-related common sense vector and integrating the two-feature context vector;

the time sequence emotion inference device is used for learning emotion relations between utterances from the historical dialogue sequence so as to estimate the replied emotion state, and then converting the state into a mixed emotion matrix;

The multi-source decoder is used for injecting the dual-feature context vector and the mixed emotion matrix into two parallel cross attention modules and then generating replies;

preferably, the attention-based common sense obtaining module directs an attention mechanism between the context vector and the common sense vector of the last utterance to obtain the context-dependent common sense vector.

Preferably, the dual view joint encoder includes:

and the emotion context coding module codes the marked speaking emotion to the context in an emotion embedding mode to obtain an emotion context vector.

And the common sense context coding module codes the common sense vectors related to the context into the context in a splicing manner to obtain the common sense context vectors.

And the feature fusion module is used for splicing the emotion context vector and the common sense context vector, and then fusing the emotion context vector and the common sense context vector into a double context vector through a multi-layer perceptron.

Preferably, the time sequence emotion inference device extracts the feature vector of each utterance, establishes emotion connection between the utterances through a single-layer gating cyclic nerve unit GRU, and calculates the emotion state of each utterance; the emotional state of the last utterance is input into a multi-layer perceptron, the recovered emotional state is estimated, and then the emotional state is constructed into a mixed emotion matrix to regulate and control co-emotion expression.

Preferably, the multi-source decoder inputs the dual context vector and the mixed emotion matrix into two internal cross attention modules respectively to generate feature vectors of two sources, and further inputs the feature vectors into the full connection layer to integrate the features output by the two modules after splicing the two feature vectors so as to generate a reply.

Fig. 3 is a schematic diagram showing a confusion degree (Perplexity) change of a co-emotion dialogue generating system model based on fusion emotion and common sense knowledge combined modeling and timing emotion reasoning according to an embodiment of the present invention, and it may be noted that the confusion degree (Perplexity) continuously decreases with the number of iterations in the training process, and the model performance reaches the optimum at 20000 times.

Fig. 4 is a schematic diagram of a reply generated by a co-emotion dialogue generation system model integrating emotion and common sense knowledge combined modeling and time emotion reasoning according to an embodiment of the present invention, where the system may be found to generate a reply that is related to a context based on a given context and includes a plurality of emotion words, which indicates that the present technical solution may achieve an expected effect.

It should be noted that, in any of the above embodiments, the methods do not necessarily follow the sequence number, and if it cannot be inferred from the execution logic that the methods must be executed in a certain order, it means that the methods may be executed in any other possible order.

In another specific embodiment, the present invention provides a co-emotion dialog generation system, comprising:

It should be understood that, the system is used to execute the method in the foregoing embodiment, and corresponding program modules in the apparatus implement principles and technical effects similar to those described in the foregoing method, and the working process of the apparatus may refer to the corresponding process in the foregoing method, which is not repeated herein.

Based on the method in the above embodiment, the embodiment of the invention provides an electronic device. The apparatus may include: at least one memory for storing programs and at least one processor for executing the programs stored by the memory. Wherein the processor is adapted to perform the method described in the above embodiments when the program stored in the memory is executed.

Based on the method in the above embodiment, the embodiment of the present invention provides a computer-readable storage medium storing a computer program, which when executed on a processor, causes the processor to perform the method in the above embodiment.

Based on the method in the above embodiments, an embodiment of the present invention provides a computer program product, which when run on a processor causes the processor to perform the method in the above embodiments.

It is to be appreciated that the processor in embodiments of the invention may be a central processing unit (centralprocessing unit, CPU), other general purpose processor, digital signal processor (digital signalprocessor, DSP), application specific integrated circuit (application specific integrated circuit, ASIC), field programmable gate array (fieldprogrammable gate array, FPGA) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. The general purpose processor may be a microprocessor, but in the alternative, it may be any conventional processor.

The method steps in the embodiments of the present invention may be implemented by hardware, or may be implemented by executing software instructions by a processor. The software instructions may be comprised of corresponding software modules that may be stored in random access memory (random access memory, RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

It will be appreciated that the various numerical numbers referred to in the embodiments of the present invention are merely for ease of description and are not intended to limit the scope of the embodiments of the present invention.

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A method for generating a co-emotion conversation, comprising the steps of:

2. The method of claim 1, wherein each utterance in each dialogue record is annotated with emotion to obtain a context utterance emotion tag and a reply utterance emotion tag corresponding to each dialogue record, specifically:

S _r ＝COMET(r,u _k )

The preset common sense type r comprises a plurality of common sense categories, each common sense categoryThe corresponding common sense set comprises five common sense inferences

3. The method of claim 2, wherein the context vector and emotion context vector for each dialog record are determined by:

H _emo-ctx ＝Enc _emo (E _W (C)+E _E (C)+E _D (C))

Splicing common sense reasoning in common sense set to obtain common sense sequenceThen input to the common sense encoder Enc _cs Obtain common sense vector->Finally, the context-dependent common sense vector is calculated by means of the attention mechanism>

α＝softmax(e)

embedding words corresponding to the context sequence into vector E _W (C) And dialog state embedding vector E _D (C) Added and input to a context encoder Enc _ctx Obtaining a context vector H _ctx ：

H _ctx ＝Enc _ctx (E _W (C)+E _D (C))

Common sense vector to be contextually relevantAnd context vector H _ctx Spliced and then input to a refining encoder Enc _ref Obtaining the common sense context vector +.>

Wherein [; and is a splicing operation.

4. A method according to claim 3, wherein the emotion context vector of each dialogue record and the common sense context vectors under all common sense categories are concatenated to obtain a vector fusing emotion and common sense, and the important features of the fused vector are highlighted by an activation function to obtain a context vector fusing dual features, specifically:

Splicing the emotion context vector and the general knowledge context vector to obtain a vector H fusing emotion and general knowledge _emo-cs-ctx ：

Applying Sigmoid function to H _emo-cs-ctx And multiplied by H _emoα-cs-ctx Then input to a multi-layer perceptron with ReLU activation function to obtain context vector integrating dual features

5. The method of claim 4, wherein the mixed emotion matrix is obtained by:

x _t ＝Enc _ctx (E _W (u _t ))[0]

Wherein x is _t From above and belowHidden layer of text encoder [ CLS ]]A location;

h _t ＝GRU(h _t-1 ,x _t )

h _k ＝MLP(h _k-1 )

6. The method according to claim 5, wherein a reply utterance is generated based on the context vector of the fused dual features and a mixed emotion matrix, in particular:

Wherein O is a multi-source decoder concealment vector;

7. A co-emotion conversation generation system, comprising:

8. The system of claim 7, wherein the emotion matrix prediction module obtains the mixed emotion matrix by:

x _t ＝Enc _ctx (E _W (u _t ))[0]

Wherein x is _t Hidden layer [ CLS ] from context encoder]A location;

feature vector x of the utterance at the current moment _t Emotional state h with the last moment utterance _t-1 Input to the gate control circulation unit GRU together, calculate The emotion state h at the current moment is obtained _t And then passed on to the next GRU:

h _t ＝GRU(h _t-1 ,x _t )

h _k ＝MLP(h _k-1 )

9. The system of claim 8, wherein the co-emotion speech recovery module is to double context vectorsAs the checked vector and the content vector, the checked vector and the content vector are input into a cross attention module to obtain a vector A _ec ：Wherein O is a multi-source decoder concealment vector; mix reply emotion matrix->As the checked vector and the content vector, the checked vector and the content vector are input into another cross attention module to obtain a vector A _er ：Vector A _ec Sum vector A _er Splicing, and inputting the characteristics from the emotion source and the common sense source into a full-connection layer to iteratively update the generated reply words.

10. An electronic device, comprising:

at least one memory for storing a program;

at least one processor for executing the memory-stored program, which processor is adapted to perform the method according to any of claims 1-6 when the memory-stored program is executed.