CN116258134B

CN116258134B - Dialogue emotion recognition method based on convolution joint model

Info

Publication number: CN116258134B
Application number: CN202310443460.0A
Authority: CN
Inventors: 宋彦; 胡博; 田元贺; 徐浩培
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2023-04-24
Filing date: 2023-04-24
Publication date: 2023-08-29
Anticipated expiration: 2043-04-24
Also published as: CN116258134A

Abstract

The invention discloses a dialogue emotion recognition method based on a convolution joint model, wherein the convolution joint model comprises a neural topic model and an attention relation graph network model, and the dialogue emotion recognition method comprises the following steps: inputting sentences in the dialogue into the trained convolution joint model to output emotion categories corresponding to the sentences in the dialogue; the dialogue emotion recognition method fully utilizes the implicit topic information of sentences to strengthen the information interaction among dialogue sentences and the characteristic representation thereof, and promotes the decoder to predict emotion categories.

Description

Dialogue emotion recognition method based on convolution joint model

Technical Field

The invention relates to the technical field of dialogue emotion recognition, in particular to a dialogue emotion recognition method based on a convolution joint model.

Background

Emotion recognition of a dialogue refers to recognizing emotion categories of all sentences in the dialogue, and the existing method tends to directly model relations among input sentences or only introduce external knowledge of word levels to enhance sentence characterization and then process the sentences, but the dialogue is characterized by theme skipping, loose structure, information redundancy and the like, so that the existing dialogue emotion recognition method cannot adapt to complex interaction relations among sentence topics in the dialogue, and cannot fully fuse context information of topic-like sentences to conduct deep reasoning so as to realize accurate emotion recognition.

Disclosure of Invention

Based on the technical problems in the background art, the invention provides a dialogue emotion recognition method based on a convolution joint model, which fully utilizes implicit subject information of sentences to strengthen information interaction among dialogue sentences and characteristic representation thereof and promotes a decoder to predict emotion categories.

The invention provides a dialogue emotion recognition method based on a convolution joint model, wherein the convolution joint model comprises a neural topic model and an attention relation graph network model, and the dialogue emotion recognition method comprises the following steps: inputting sentences in the dialogue into the trained convolution joint model to output emotion categories corresponding to the sentences in the dialogue;

the training process of the convolution joint model is as follows:

s1: building an input dialogThe input dialog is +.>Encoding to obtain input dialog->All sentences->Coding vector +.>，/>，nIs the total number of sentences in the dialogue;

s2: using a priori parameters of the given dataset and />Extracting potential topic distribution->Training the neural topic model of the variational self-encoder in an unsupervised manner to obtain a topic feature matrix +.>, wherein /> and />Respectively representing the vocabulary size and the number of topics;

s3: using topic feature matricesWill input dialogue->Chinese sentence->Is mapped to word codes, and the sentence ++is obtained by averaging pooling>Coding of->Based on the coding->Calculating the self-attention coefficient of each topic +.>；

S4: calculating a topic feature matrix using multi-layer perceptronsThe representation of each topic in (a) gives the topic memory representation +.>By means of the self-attention coefficient->Characterization of the subject memory>Aggregation is carried out to obtain statement +.>Subject characterization vector->；

S5: will input a dialogueMiddle->Personal sentence and corresponding->The topic features are used as nodes of the attention relation graph network model, and the coding vector of the sentence is +.>And topic representation vector->A method for generating a attention relationship graph network model>The initial node representation, noted->Wherein when->When (I)>When (when)When (I)>；

S6: obtaining an adjacency matrix between nodes according to the interconnection and action relation of different nodes in the attention relation graph network modelAnd relation matrix->And using the attention relationship graph network model to +.>And relation matrix->Modeling is performed in which->For any two nodes->Edge between, if two nodes +.>Connect with->1, otherwise->0->Representing edge->The marked value;

s7: adjacent matrix of (a)And relation matrix->Representing the node of the last layer output in the attention relation graph network model by +.>Sum sentence->The corresponding coding vector in step S1 +.>Performing adhesion to obtain sentence->Is (are) enhanced hidden vector->The enhancement hidden vector +.>Output of prediction vector by decoder>Selecting said predictive vector->Emotion category with highest score as sentence +.>Is used for predicting emotion classification;

wherein a priori parameters of the given dataset are utilized and />Extracting potential topic distribution->Obtaining a theme feature matrix->The formula involved is specifically as follows:

wherein ,representing an input sentence in a given dataset>Middle->Personal word->Predictive probability of each word in the corresponding vocabulary, < +.>Representing an input sentence in a given dataset>Middle->Personal word->Subject distribution of->Representing a trainable topic feature matrix, +.>Representing trainable vectors, ++> and />A priori parameters representing a given dataset, +.>Is a random variable,/->Representing a multi-layer sensor->Representing an input sentence in a given dataset>Corresponding one-hot coding,/->Representing one-hot code->Implicit representation of->Is->Personal word->Implicit representation of the word->Input statement for given dataset>Words of->Representing the input of a sentence in a given dataset>The single-hot encoding of all words in (a).

Further, in step S1: building an input dialogThe input dialog is +.>Encoding to obtain input dialog->All sentences->Coding vector +.>Specifically, the method comprises the following steps:

will input a dialogueInput to Roberta encoder for encoding to obtain input dialogueCInitial coding vector of all sentences in +.>；

Initial coding vectors corresponding to all sentencesInput to->Network, get all sentences->Coding vector +.>。

Further, in step S2: using a priori parameters of the given dataset and />Extracting potential topic distribution->Training the neural topic model of the variational self-encoder in an unsupervised manner to obtain a topic feature matrix +.>Specifically, the method comprises the following steps:

in addition, the given data set is input into a neural topic model of the variable self-encoder, and the neural topic model processes the given data set as follows:

input sentences in given data setProcessing to obtain a single thermal code->One-time heat encoding->Delivering to a multi-layer perceptron to obtain said input sentenceXImplicit representation of +.>；

Based on implicit representationA priori parameters of the potential topic distribution Z> and />Estimation is performed from a priori parameters +.> and />Random decimation in the topic distribution Z of the representation>As said input sentence->Wherein the implicit representation +.>Is the firstPersonal word->Implicit representation of the word->For inputting sentencesXWords of (a);

trainable theme feature matrixAnd trainable vector->Representation of potential topic->Performing linear transformation and->After the function operation, the predictive probability of the word is obtained>；

Training in an unsupervised mannerTraining the neural topic model, and then obtaining a topic feature matrix，Each line is the +.>Personal word embedding->Each column is +.>Personal topic embedding->。

Further, in step S3: using topic feature matricesWill input dialogue->Chinese sentence->Is mapped to word codes, and the sentence ++is obtained by averaging pooling>Coding of->Based on the coding->Calculating the self-attention coefficient of each topic +.>Specifically, the method comprises the following steps:

using topic feature matricesWill input dialogue->Chinese sentence->Mapping all words of (2) to word code +.>，，/>For statement->The total number of midwords;

encoding wordsPerforming average pooling to obtain sentence->Coding of->Statement->Is>The subject is code->In->Numerical value of individual dimension>Based on multiple layersSensor logarithmic value->The dimension of (2) is expanded to obtain an expansion vector +.>；

Based on expansion vectorAnd coding vector->Calculating the self-attention coefficient of each theme>。

Further, the codingThe formula of (2) is as follows:

wherein ,representing average pooling>Indicate->Personal statement and->Probability of being related to individual topic,/->Express logarithmic value +.>Expansion vector obtained after expansion of dimension of (2),. About.>Representation->Function (F)>Representing a multi-layer perceptron for mapping individual probability values into vectors of high dimensionality.

Further, the subject memory characterizationThe calculation formula of (2) is as follows:

wherein ,for the topic feature matrix->The%>Personal topic embedding->Representing a multi-layered perceptron for embedding and mapping each topic as a vector + ->Dimension of->Representing the number of topics.

Further, in step S6, specifically includes:

obtaining an adjacency matrix between nodes according to the interconnection and action relation of different nodes in the attention relation graph network modelAnd relation matrix->；

Based on adjacency matrixAnd relation matrix->Network model using attention relationship graph>For->Modeling of the initial nodes, wherein->Is +.>Are mapped to adjacency matrix->One element of->While element->Can be mapped to a relation matrix->One element of->；

The relation matrixThe values of the elements in (a) are of three types: an utterance-utterance, a topic-topic, a topic-utterance, the three types being edge types of a network model of an attention relationship graph, each of the edge types comprising a different kind of value;

for edge types of words-words, according to node pairsCorresponding sentence pair->In dialogue->Whether or not adjacent, in conversation->Whether from the same speaker, resulting in eight types of values: adjacent-future-self, adjacent-future-other people, adjacent-past-self, adjacent-past-other people, away from-future-self, away from-future-other people, away from-past-self, away from-past-other people;

for the edge type as the theme-theme, according to the node pairCorresponding sentence pair->In dialogue->Whether or not to be adjacent toFour types of values were obtained: adjacency-future, adjacency-past, distant-future, distant-past;

for the edge type to be a topic-utterance, a separate type of value is introduced: influence.

Further, in step S7, for each node, a characterization is madeWill be->Other nodes with connectionsInformation aggregation to node->In, the updated node representation +.>：

wherein ,representing node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>For node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>Is +.>A set of other nodes with connections, +.>Is a relation matrix->The starting node is->Is a set of possible values of ∈ ->Is the attention relation graph network model +.>Matrix for self-connection in layer, +.>Is->In the layer attention relation graph network model, the current node +.>In relation->For extracting other nodes under the condition->Matrix of information->For node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>For node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>Indicate->Node +.>For->Aggregate information coefficient,/->Representing node->And->Whether there is a connection between them, if there is a connection +.>0, otherwise->1->Representing node->And->Whether there is a connection between them, if there is a connection +.>0, otherwise->1->Representing a set of all nodes in the attention relationship graph network model; />Representation->Middle->And each node.

Further, the enhanced hidden vectorThe calculation formula is as follows:

prediction vectorThe calculation formula of (2) is as follows:

wherein ,the value of each dimension of (2) represents a score of the emotion category represented by the corresponding dimension, +.> and />Is a predictable vector for enhancing the hidden vector +.>Dimension maps to the number of emotion categories, +.>Representing the connection between tensors.

Further, the convolution joint model is trained through a loss function to converge to an optimal state, specifically:

will input a dialogueAll sentences->Corresponding prediction vector +.>Aggregation is carried out to obtain a prediction vector set +.>；

Aggregating prediction vectorsDialog with input->Corresponding true emotion class set +.>Solving cross entropy to obtain a loss function of the convolution joint model,the convolution joint model reaches a convergence state by minimizing the loss function;

the loss functionThe formula is as follows:

wherein ,for emotion total number (I)>Representing a convolution joint model prediction statement +.>Belongs to emotion category set +.>Middle->Personal emotion->Probability of->Representation sentence->Whether or not the actual emotion classification->If it belongs to->1, otherwiseIs 0.

The dialogue emotion recognition method based on the convolution joint model has the advantages that: according to the dialogue emotion recognition method based on the convolution joint model, provided by the structure, sentence theme characteristics are extracted by using the nerve theme model, so that dialogue sentences with similar themes can mutually use respective context information to perform joint reasoning, the problems of more jumps of the dialogue themes, information redundancy and the like are solved, and the representation quality of the speech sentences is improved; the relationship-driven statement information and the topic information are fused through the attention relationship graph network model, the enhanced feature representation is obtained, and the emotion recognition performance of the attention relationship graph network model on the dialogue is improved.

Drawings

FIG. 1 is a schematic diagram of the structure of the present invention;

FIG. 2 is a framework diagram of the construction of a convolution joint model.

Detailed Description

In the following detailed description of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The invention may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit or scope of the invention, which is therefore not limited to the specific embodiments disclosed below.

As shown in fig. 1 and 2, according to the dialog emotion recognition method based on the convolution joint model, sentences in a dialog are input into the trained convolution joint model so as to output emotion types corresponding to the sentences in the dialog; the convolution joint model comprises an editor, a neural topic model, an attention relation graph network model and a decoder which are connected in sequence, so that the emotion type is output through the decoder.

In the embodiment, an external topic knowledge is introduced by using a neural topic model (a main body is a variation self-encoder) to extract topic features of each sentence, and relational modeling among sentences, among topic features and among sentences and topic features is realized through a attention relation graph network model, so that information interaction among conversational sentences and characteristic representation of the conversational sentences are enhanced by fully utilizing implicit topic information of the sentences, a decoder is promoted to predict emotion categories, and the emotion categories are output by using a convolution joint model, wherein the method comprises the following steps of:

to facilitate a detailed description of the convolution joint model training process, the following symbol labels are introduced:

emotion category set as, wherein />Is->Emotion of (I)>Is the total emotion number;

the input dialog is noted as, wherein />Is the%>Statement->Is the total number of sentences in the dialogue;

input dialogThe corresponding set of true emotion categories is +.>, wherein Representation sentence->Whether or not to correspond to emotion category->；

The network model for recording the attention relation graph is as followsWherein E is a node set in the attention relation graph network model, A is an adjacency matrix among nodes, R is a relation matrix among nodes, and the layer number of the attention relation graph network model is recorded as L.

The training process of the convolution joint model is as follows:

s1: building an input dialogThe input dialog is +.>Encoding to obtain input dialog->All sentences->Coding vector +.>，/>，nIs the total number of sentences in the dialogue, and specifically comprises steps S11 to S12;

s11: will input a dialogueMiddle->The sentence is input into a Roberta encoder for encoding to obtain an input dialogue +.>Initial coding vector of all sentences in +.>；

S12: initial coding vectors corresponding to all sentencesInput to->Network, get all sentences->Coding vector +.>, wherein />The network is an existing two-way long and short memory network;

inputting dialog through steps S11 to S12Middle->The individual statements are encoded. The Roberta encoder helps the BiLSTM (two-way long and short term memory network) model to better understand words and semantic information in sentences through deep text coding and rich semantic representation; the BiLSTM model captures the long-distance dependency relationship between the context structure and the sentences of the dialogue through the sequence modeling and the long-term dependency modeling capability; by combining the two (Roberta encoder and BiLSTM model), complex semantic relations in sentences can be processed better, so that the quality and the expression capacity of sentence coding are improved.

in addition, the given data set is input into a neural topic model of the variable self-encoder, and the neural topic model processes the given data set as follows, specifically comprising steps S21 to S4:

s21: in addition, given data set, input sentences in the given data setProcessing to obtain a single thermal code->One-time heat encoding->Delivering to a multi-layer perceptron to obtain said input sentence +.>Implicit representation of +.>；

wherein ,representing statement +.>Corresponding one-hot coding,/->A multi-layer sensor is shown as such,，/>representing one-hot code->Implicit representation of->Representing +.>Performing single-heat coding on all words in the list;

it should be noted that the additional given data set and the construction of the input dialogNot belonging to the same training set, the further given data set may be expressed in particular as the further input sentence +.>，/>（/>) Representing input sentence +.>The word of (the word is specifically associated with the input sentence +.>The words in (a) are corresponding).

S22: based on implicit representationA priori parameters of the potential topic distribution Z> and />Estimation is performed from a priori parameters +.> and />Random decimation in the topic distribution Z of the representation>As said input sentence->Wherein the implicit representation +.>Is->Personal word->Implicit representation of the word->For inputting sentencesXThe words in (a) are specifically:

wherein , and />A priori parameters representing a given dataset, +.>Is a random variable, it being understood that,/-is>Is->Personal word->At->Implicit representation of the corresponding in (a);

s23: trainable theme feature matrixAnd trainable vector->Representation of potential topic->Performing linear transformation and->After the function operation, the predictive probability of the word is obtained>；

wherein ,representing input sentence +.>Personal word->The prediction probability of each word in the corresponding vocabulary can be used for training parameter learning in a neural topic model based on a VAE (variable value) unsupervised mode; />For entering sentence +.>Personal word->The topic distribution of (2) can be used for participating in the subsequent training of parameter learning in a neural topic model based on a VAE (variable value) unsupervised mode;representing a trainable topic feature matrix, +.>Representing trainable vectors, optimizing +_during training of neural topic models> and />These two learnable parameters to bring the neural topic model to the desired output;

s24: training the neural topic model in an unsupervised manner and then obtaining a topic feature matrix，/>Each line is the +.>Personal word embedding->Each column is the +.>Personal topic embedding->；

The given data set in step S21 is subjected to the data processing in steps S21 to S23 and then used as an input of the neural topic model in step S24, and the neural topic model is trained in an unsupervised manner.

wherein , and />Vocabulary size and number of topics, respectively, < >>Can be regarded as a word insert of a specific word (set +.>The individual word is embedded as +.>) Each dimension of the embedding corresponds to a probability value of the word with respect to a particular topic, and likewise, < >>Can be regarded as a topic embedding for a particular topic(let->The individual subject is embedded as->) Each dimension of which corresponds to a probability value that the topic is associated with a particular word.

S21 to S24, training a theme feature matrix of the theme model, wherein each row of the theme feature matrix obtained through training represents word embedding, and each column represents theme embedding, so that the characterization of the words and the characterization of the theme are naturally connected through the theme feature matrix; each element of the topic feature matrix describes the correlation between a certain pair of words and a topic, and the characterization of the words and the characterization of the topic are only the information of the topic feature matrix is reflected in different dimensions; by using the obtained topic feature matrix, a topic token corresponding to a certain word token (sentence token) can be obtained, and the value of each dimension of the topic token represents the probability value of the word (sentence) related to each topic.

S3: using topic feature matricesWill input dialogue->Chinese sentence->Is mapped to word codes, and the sentence ++is obtained by averaging pooling>Coding of->Based on the coding->Calculating the self-attention coefficient of each topic +.>Specifically, steps S31 to S33 are included:

s31: using topic feature matricesWill input dialogue->Chinese sentence->Mapping all words of (2) to word code +.>，，/>For statement->The total number of midwords;

s32: encoding wordsPerforming average pooling to obtain sentence->Coding of->Statement->Is>The subject is code->In->Numerical value of individual dimension>Based on the logarithmic value of the multilayer sensor>Expanding the dimension to obtain an expansion vector；

Due to codingWord code projected by all topic feature matrices +.>Averaging the pooled results, encoding +.>The word code +.>I.e. each dimension represents a correlation with a certain topic, i.e.: coding->Is sentence->Is represented by +.>Probability associated with the corresponding topic->I.e. +.>For vector representation, +.>Is represented by a numerical value;

S33: based on expansion vectorAnd coding vector->Calculating the self-attention coefficient of each theme>；

wherein ,representation->Function (F)>Representing a multi-layer perceptron.

wherein ,for the topic feature matrix->The%>Personal topic embedding->Representing a multi-layered perceptron for embedding and mapping each topic as a vector + ->Dimension of->Representing the number of topics; it should be noted that->Related to the topic feature matrix, i.e. to +.>Related, and sentence code->Different.

Through the steps S2 to S4, training is carried out on the topic feature matrix in the neural topic model and the feature extraction (topic characterization vector) of the speaking topic is carried out, so that the effective training of the neural topic model is realized.

The steps S2 to S4 provide a mechanism for enhancing dialogue statement characterization by using the neural topic model, and sentence topic features are extracted by using the neural topic model, so that dialogue statements with similar topics can mutually use respective context information to perform joint reasoning, the problems of more jumps of dialogue topics, information redundancy and the like are solved, and the speech statement characterization quality is improved.

S6: obtaining an adjacency matrix between nodes according to the interconnection and action relation of different nodes in the attention relation graph network modelAnd relation matrix->And using the attention relationship graph network model to +.>And relation matrix->Modeling is performed in which->For any two nodes->Edge between, if two nodes +.>Connect with->1, otherwise->0->Representing edge->The marked values include in particular:

s61: obtaining an adjacency matrix between nodes according to the interconnection and action relation of different nodes in the attention relation graph network modelAnd relation matrix->；

S62: based on adjacency matrixAnd relation matrix->To make full use of the relationships between sentences and subject features in a dialog, a attention relationship graph network model is utilized>For->Modeling of the initial nodes, wherein->Is +.>Are mapped to adjacency matrix->One element of->While element->Can be mapped to a relation matrix->One element of->；

Wherein the relation matrixThe values of the elements in (a) are of three types: an utterance-utterance, a topic-topic, a topic-utterance, the three types being edge types of a network model of an attention relationship graph, each of the edge types comprising a different kind of value;

for the edge type as the theme-theme, according to the node pairCorresponding sentence pair->In dialogue->If adjacent, resulting in four types of values: adjacency-future, adjacency-past, distant-future, distant-past;

S7: based on adjacency matrixAnd relation matrix->Representing the node of the last layer output in the attention relation graph network model by +.>Sum sentence->The corresponding coding vector in step S1 +.>Performing adhesion to obtain sentence->Is (are) enhanced hidden vector->The enhancement hidden vector +.>Output of prediction vector by decoder>Selecting said predictive vector->Emotion category with highest score as sentence +.>Is used for predicting emotion classification;

characterization for each nodeWill be->Other nodes with connections->Is aggregated to a nodeIn, the updated node representation +.>：

wherein ,representing node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>For node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>Is +.>A set of other nodes with connections, +.>Is a relation matrix->The starting node is->Is a set of possible values of ∈ ->Is the attention relation graph network model +.>Matrix for self-connection in layer, +.>Is->In the layer attention relation graph network model, the current node +.>In relation->For extracting other nodes under the condition->Matrix of information->For node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>For node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>Indicate->Node +.>For->Aggregate information coefficient,/->Representing node->And->Whether there is a connection between them, if there is a connection +.>0, otherwise->1->Representing node->And->Whether there is a connection between them, if there is a connection +.>0, otherwise->1->Representing a set of all nodes in the attention relationship graph network model; />Representation->Middle->Personal node->Representing the connection between tensors.

According to steps S5 to S7, the relationship-driven dialogue statement information and the theme information are fused through the attention relationship graph network model, the enhanced characteristic representation is obtained, and the emotion recognition performance of the attention relationship graph network model on the dialogue is improved.

And (3) combining the neural topic model and the attention relation diagram network model through the steps S1 to S7, wherein topic characterization vectors output by the neural topic model participate in initial node characterization of the attention relation diagram network model, so that the final convolution joint model can effectively realize identification accuracy and high efficiency of dialogue emotion.

In this embodiment, the convolution joint model is trained to converge to the optimal state by the loss function, specifically:

Aggregating prediction vectorsDialog with input->Corresponding true emotion class set +.>Obtaining cross entropy to obtain a loss function of the convolution joint model, and enabling the convolution joint model to reach a convergence state by minimizing the loss function;

the loss functionThe formula is as follows:

wherein For emotion total number (I)>Representing a convolution joint model prediction statement +.>Belongs to emotion category set +.>Middle->Personal emotion->Probability of->Representation sentence->Whether or not the actual emotion classification->If it belongs to->1, otherwiseIs 0.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims

1. A dialog emotion recognition method based on a convolution joint model, wherein the convolution joint model comprises a neural topic model and an attention relationship graph network model, the dialog emotion recognition method comprises the following steps: inputting sentences in the dialogue into the trained convolution joint model to output emotion categories corresponding to the sentences in the dialogue;

the training process of the convolution joint model is as follows:

s1: building an input dialogThe input dialog is +.>Encoding to obtain input dialog->All sentences->Coding vector +.>，/>，/>Is the total number of sentences in the dialogue;

s2: using a priori parameters of a given dataset and />Extracting potential topic distribution->Training the neural topic model of the variational self-encoder in an unsupervised manner to obtain a topic feature matrix +.>, wherein /> and />Respectively representing the vocabulary size and the number of topics;

s3: using topic feature matricesWill input dialogue->Chinese sentence->Is mapped to word codes, and the sentence ++is obtained by averaging pooling>Coding of->Based on the coding->Calculate each masterAttention coefficient of questions->；

S4: calculating a topic feature matrix using multi-layer perceptronsThe representation of each topic in the database to obtain the topic memory representationBy means of the self-attention coefficient->Characterization of the subject memory>Aggregation is carried out to obtain statement +.>Subject characterization vector->；

S6: obtaining an adjacency matrix between nodes according to the interconnection and action relation of different nodes in the attention relation graph network modelAnd relation matrix->And using the attention relationship graph network model to +.>And relation matrix->Modeling is performed in which->For any two nodes->Edges therebetween, e.g.Fruit two nodes->Connect with->1, otherwise->0->Representing edge->The marked value;

wherein a priori parameters of the given dataset are utilized and />Extracting potential topic distribution->Obtaining a theme feature matrixThe formula involved is specifically as follows:

wherein ,representing an input sentence in a given dataset>Middle->Personal word->The predicted probabilities for each word in the vocabulary correspond,representing an input sentence in a given dataset>Middle->Personal word->Subject distribution of->Representing a matrix of features of the subject matter that may be trained,representing trainable vectors, ++> and />A priori parameters representing a given dataset, +.>Is a random variable which is used to determine the random,representing a multi-layer sensor->Representing an input sentence in a given dataset>Corresponding one-hot coding,/->Representing one-hot code->Implicit representation of->Is->Personal word->Implicit representation of the word->For statement +.>The word(s) in (a) is (are),representing the input of a sentence in a given dataset>The single-hot encoding of all words in (a).

2. The dialog emotion recognition method based on the convolution joint model according to claim 1, wherein in step S1: building an input dialogThe input dialog is +.>Encoding to obtain input dialog->All sentences->Coding vector +.>Specifically, the method comprises the following steps:

will input a dialogueInput to Roberta encoder for encoding to obtain input dialogue->Initial coding vector of all sentences in +.>；

Initial coding vectors corresponding to all sentencesInput to->Network, get all sentences->Is a coded vector of (a)。

3. The dialog emotion recognition method based on the convolution joint model according to claim 1, characterized in that in step S2: using a priori parameters of the given dataset and />Extracting potential topic distribution->Training the neural topic model of the variational self-encoder in an unsupervised manner to obtain a topic feature matrix +.>Specifically, the method comprises the following steps:

input sentences in given data setProcessing to obtain a single thermal code->One-time heat encoding->To transfer toThe multi-layer perceptron obtains the input sentence +.>Implicit representation of +.>；

Based on implicit representationA priori parameters of the potential topic distribution Z> and />Estimation is performed from a priori parameters +.> and />Random decimation in the topic distribution Z of the representation>As said input sentence->Wherein the implicit representation +.>Is->Personal wordImplicit representation of the word->For inputting sentence +.>Words of (a);

Training the neural topic model in an unsupervised manner and then obtaining a topic feature matrix，/>Each line is the +.>Personal word embedding->Each column is +.>Personal topic embedding->。

4. The dialog emotion recognition method based on the convolution joint model according to claim 1, characterized in that in step S3: using topic feature matricesWill input dialogue->Chinese sentence->Is mapped to word codes, and the sentence ++is obtained by averaging pooling>Coding of->Based on the coding->Calculating the self-attention coefficient of each topic +.>Specifically, the method comprises the following steps:

using topic feature matricesWill input dialogue->Chinese sentence->Mapping all words of (2) to word code +.>，/>，For statement->The total number of midwords;

encoding wordsPerforming average pooling to obtain sentence->Coding of->Statement->Is>The subject is code->In->Numerical value of individual dimension>Based on the logarithmic value of the multilayer sensor>The dimension of (2) is expanded to obtain an expansion vector +.>；

5. The method for identifying dialog emotion based on convolution joint model according to claim 4, characterized in that said codeThe formula of (2) is as follows:

wherein ,representing average pooling>Representation->Function (F)>Indicate->Personal statement and->Probability of being related to individual topic,/->Express logarithmic value +.>An expansion vector is obtained after the dimension of (a) is expanded,representing a multi-layer perceptron for mapping individual probability values into vectors of high dimensionality.

6. The method for identifying dialog emotion based on convolution joint model of claim 4, wherein said topic memory is characterized byThe calculation formula of (2) is as follows:

7. The method for identifying dialog emotion based on convolution joint model according to claim 1, wherein in step S6, specifically comprising:

for edge types of words-words, according to node pairsCorresponding sentence pair->In dialogue->Whether or not adjacent, in conversation->Whether or not to come from the order of the sequences in (From the same speaker, eight types of values are obtained: adjacent-future-self, adjacent-future-other people, adjacent-past-self, adjacent-past-other people, away from-future-self, away from-future-other people, away from-past-self, away from-past-other people;

8. The method for identifying dialog emotion based on convolution joint model according to claim 7, characterized in that in step S7, each node is characterized byWill be->Other nodes with connections->Information aggregation to node->In, the updated node representation +.>：

wherein ,representing node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>For node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>Is +.>A set of other nodes with connections, +.>Is a relation matrix->The starting node is->Is a set of possible values of ∈ ->Is the attention relation graph network model +.>Matrix for self-connection in layer, +.>Is->In the layer attention relation graph network model, the current node +.>In relation->For extracting other nodes under the condition->Matrix of information->For node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>For node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>Indicate->Node +.>For->The information coefficient is aggregated and the information coefficient is,representing node->And->Whether there is a connection between them, if there is a connection +.>0, otherwise->1->Representing nodesAnd->Whether there is a connection between them, if there is a connection +.>0, otherwise->1->Representing the set of all nodes in the attention relationship graph network model, +.>Representation->Middle->And each node.

9. The method for identifying dialog emotion based on convolution joint model of claim 7, wherein said enhanced hidden vectorThe calculation formula is as follows:

prediction vectorThe calculation formula of (2) is as follows:

wherein ,the value of each dimension represents a score for the emotion class represented by the corresponding dimension, +.> and />Is a predictable vector for enhancing the hidden vector +.>Dimension maps to the number of emotion categories, +.>Representing the connection between tensors.

10. The dialog emotion recognition method based on a convolution joint model according to claim 9, wherein the training of the convolution joint model to converge to an optimal state by a loss function is specifically:

the loss functionThe formula is as follows:

wherein ,for emotion total number (I)>Representing a convolution joint model prediction statement +.>Belongs to emotion category set->Middle->Personal emotion->Probability of->Representation sentence->Whether or not the actual emotion classification->If it belongs to->1, otherwise->Is 0.