CN116258134B - Dialogue emotion recognition method based on convolution joint model - Google Patents

Dialogue emotion recognition method based on convolution joint model Download PDF

Info

Publication number
CN116258134B
CN116258134B CN202310443460.0A CN202310443460A CN116258134B CN 116258134 B CN116258134 B CN 116258134B CN 202310443460 A CN202310443460 A CN 202310443460A CN 116258134 B CN116258134 B CN 116258134B
Authority
CN
China
Prior art keywords
topic
representing
sentence
input
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310443460.0A
Other languages
Chinese (zh)
Other versions
CN116258134A (en
Inventor
宋彦
胡博
田元贺
徐浩培
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202310443460.0A priority Critical patent/CN116258134B/en
Publication of CN116258134A publication Critical patent/CN116258134A/en
Application granted granted Critical
Publication of CN116258134B publication Critical patent/CN116258134B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Optimization (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Operations Research (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a dialogue emotion recognition method based on a convolution joint model, wherein the convolution joint model comprises a neural topic model and an attention relation graph network model, and the dialogue emotion recognition method comprises the following steps: inputting sentences in the dialogue into the trained convolution joint model to output emotion categories corresponding to the sentences in the dialogue; the dialogue emotion recognition method fully utilizes the implicit topic information of sentences to strengthen the information interaction among dialogue sentences and the characteristic representation thereof, and promotes the decoder to predict emotion categories.

Description

Dialogue emotion recognition method based on convolution joint model
Technical Field
The invention relates to the technical field of dialogue emotion recognition, in particular to a dialogue emotion recognition method based on a convolution joint model.
Background
Emotion recognition of a dialogue refers to recognizing emotion categories of all sentences in the dialogue, and the existing method tends to directly model relations among input sentences or only introduce external knowledge of word levels to enhance sentence characterization and then process the sentences, but the dialogue is characterized by theme skipping, loose structure, information redundancy and the like, so that the existing dialogue emotion recognition method cannot adapt to complex interaction relations among sentence topics in the dialogue, and cannot fully fuse context information of topic-like sentences to conduct deep reasoning so as to realize accurate emotion recognition.
Disclosure of Invention
Based on the technical problems in the background art, the invention provides a dialogue emotion recognition method based on a convolution joint model, which fully utilizes implicit subject information of sentences to strengthen information interaction among dialogue sentences and characteristic representation thereof and promotes a decoder to predict emotion categories.
The invention provides a dialogue emotion recognition method based on a convolution joint model, wherein the convolution joint model comprises a neural topic model and an attention relation graph network model, and the dialogue emotion recognition method comprises the following steps: inputting sentences in the dialogue into the trained convolution joint model to output emotion categories corresponding to the sentences in the dialogue;
the training process of the convolution joint model is as follows:
s1: building an input dialogThe input dialog is +.>Encoding to obtain input dialog->All sentences->Coding vector +.>,/>nIs the total number of sentences in the dialogue;
s2: using a priori parameters of the given dataset and />Extracting potential topic distribution->Training the neural topic model of the variational self-encoder in an unsupervised manner to obtain a topic feature matrix +.>, wherein /> and />Respectively representing the vocabulary size and the number of topics;
s3: using topic feature matricesWill input dialogue->Chinese sentence->Is mapped to word codes, and the sentence ++is obtained by averaging pooling>Coding of->Based on the coding->Calculating the self-attention coefficient of each topic +.>
S4: calculating a topic feature matrix using multi-layer perceptronsThe representation of each topic in (a) gives the topic memory representation +.>By means of the self-attention coefficient->Characterization of the subject memory>Aggregation is carried out to obtain statement +.>Subject characterization vector->
S5: will input a dialogueMiddle->Personal sentence and corresponding->The topic features are used as nodes of the attention relation graph network model, and the coding vector of the sentence is +.>And topic representation vector->A method for generating a attention relationship graph network model>The initial node representation, noted->Wherein when->When (I)>When (when)When (I)>
S6: obtaining an adjacency matrix between nodes according to the interconnection and action relation of different nodes in the attention relation graph network modelAnd relation matrix->And using the attention relationship graph network model to +.>And relation matrix->Modeling is performed in which->For any two nodes->Edge between, if two nodes +.>Connect with->1, otherwise->0->Representing edge->The marked value;
s7: adjacent matrix of (a)And relation matrix->Representing the node of the last layer output in the attention relation graph network model by +.>Sum sentence->The corresponding coding vector in step S1 +.>Performing adhesion to obtain sentence->Is (are) enhanced hidden vector->The enhancement hidden vector +.>Output of prediction vector by decoder>Selecting said predictive vector->Emotion category with highest score as sentence +.>Is used for predicting emotion classification;
wherein a priori parameters of the given dataset are utilized and />Extracting potential topic distribution->Obtaining a theme feature matrix->The formula involved is specifically as follows:
wherein ,representing an input sentence in a given dataset>Middle->Personal word->Predictive probability of each word in the corresponding vocabulary, < +.>Representing an input sentence in a given dataset>Middle->Personal word->Subject distribution of->Representing a trainable topic feature matrix, +.>Representing trainable vectors, ++> and />A priori parameters representing a given dataset, +.>Is a random variable,/->Representing a multi-layer sensor->Representing an input sentence in a given dataset>Corresponding one-hot coding,/->Representing one-hot code->Implicit representation of->Is->Personal word->Implicit representation of the word->Input statement for given dataset>Words of->Representing the input of a sentence in a given dataset>The single-hot encoding of all words in (a).
Further, in step S1: building an input dialogThe input dialog is +.>Encoding to obtain input dialog->All sentences->Coding vector +.>Specifically, the method comprises the following steps:
will input a dialogueInput to Roberta encoder for encoding to obtain input dialogueCInitial coding vector of all sentences in +.>
Initial coding vectors corresponding to all sentencesInput to->Network, get all sentences->Coding vector +.>
Further, in step S2: using a priori parameters of the given dataset and />Extracting potential topic distribution->Training the neural topic model of the variational self-encoder in an unsupervised manner to obtain a topic feature matrix +.>Specifically, the method comprises the following steps:
in addition, the given data set is input into a neural topic model of the variable self-encoder, and the neural topic model processes the given data set as follows:
input sentences in given data setProcessing to obtain a single thermal code->One-time heat encoding->Delivering to a multi-layer perceptron to obtain said input sentenceXImplicit representation of +.>
Based on implicit representationA priori parameters of the potential topic distribution Z> and />Estimation is performed from a priori parameters +.> and />Random decimation in the topic distribution Z of the representation>As said input sentence->Wherein the implicit representation +.>Is the firstPersonal word->Implicit representation of the word->For inputting sentencesXWords of (a);
trainable theme feature matrixAnd trainable vector->Representation of potential topic->Performing linear transformation and->After the function operation, the predictive probability of the word is obtained>
Training in an unsupervised mannerTraining the neural topic model, and then obtaining a topic feature matrixEach line is the +.>Personal word embedding->Each column is +.>Personal topic embedding->
Further, in step S3: using topic feature matricesWill input dialogue->Chinese sentence->Is mapped to word codes, and the sentence ++is obtained by averaging pooling>Coding of->Based on the coding->Calculating the self-attention coefficient of each topic +.>Specifically, the method comprises the following steps:
using topic feature matricesWill input dialogue->Chinese sentence->Mapping all words of (2) to word code +.>,/>For statement->The total number of midwords;
encoding wordsPerforming average pooling to obtain sentence->Coding of->Statement->Is>The subject is code->In->Numerical value of individual dimension>Based on multiple layersSensor logarithmic value->The dimension of (2) is expanded to obtain an expansion vector +.>
Based on expansion vectorAnd coding vector->Calculating the self-attention coefficient of each theme>
Further, the codingThe formula of (2) is as follows:
wherein ,representing average pooling>Indicate->Personal statement and->Probability of being related to individual topic,/->Express logarithmic value +.>Expansion vector obtained after expansion of dimension of (2),. About.>Representation->Function (F)>Representing a multi-layer perceptron for mapping individual probability values into vectors of high dimensionality.
Further, the subject memory characterizationThe calculation formula of (2) is as follows:
wherein ,for the topic feature matrix->The%>Personal topic embedding->Representing a multi-layered perceptron for embedding and mapping each topic as a vector + ->Dimension of->Representing the number of topics.
Further, in step S6, specifically includes:
obtaining an adjacency matrix between nodes according to the interconnection and action relation of different nodes in the attention relation graph network modelAnd relation matrix->
Based on adjacency matrixAnd relation matrix->Network model using attention relationship graph>For->Modeling of the initial nodes, wherein->Is +.>Are mapped to adjacency matrix->One element of->While element->Can be mapped to a relation matrix->One element of->
The relation matrixThe values of the elements in (a) are of three types: an utterance-utterance, a topic-topic, a topic-utterance, the three types being edge types of a network model of an attention relationship graph, each of the edge types comprising a different kind of value;
for edge types of words-words, according to node pairsCorresponding sentence pair->In dialogue->Whether or not adjacent, in conversation->Whether from the same speaker, resulting in eight types of values: adjacent-future-self, adjacent-future-other people, adjacent-past-self, adjacent-past-other people, away from-future-self, away from-future-other people, away from-past-self, away from-past-other people;
for the edge type as the theme-theme, according to the node pairCorresponding sentence pair->In dialogue->Whether or not to be adjacent toFour types of values were obtained: adjacency-future, adjacency-past, distant-future, distant-past;
for the edge type to be a topic-utterance, a separate type of value is introduced: influence.
Further, in step S7, for each node, a characterization is madeWill be->Other nodes with connectionsInformation aggregation to node->In, the updated node representation +.>
wherein ,representing node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>For node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>Is +.>A set of other nodes with connections, +.>Is a relation matrix->The starting node is->Is a set of possible values of ∈ ->Is the attention relation graph network model +.>Matrix for self-connection in layer, +.>Is->In the layer attention relation graph network model, the current node +.>In relation->For extracting other nodes under the condition->Matrix of information->For node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>For node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>Indicate->Node +.>For->Aggregate information coefficient,/->Representing node->And->Whether there is a connection between them, if there is a connection +.>0, otherwise->1->Representing node->And->Whether there is a connection between them, if there is a connection +.>0, otherwise->1->Representing a set of all nodes in the attention relationship graph network model; />Representation->Middle->And each node.
Further, the enhanced hidden vectorThe calculation formula is as follows:
prediction vectorThe calculation formula of (2) is as follows:
wherein ,the value of each dimension of (2) represents a score of the emotion category represented by the corresponding dimension, +.> and />Is a predictable vector for enhancing the hidden vector +.>Dimension maps to the number of emotion categories, +.>Representing the connection between tensors.
Further, the convolution joint model is trained through a loss function to converge to an optimal state, specifically:
will input a dialogueAll sentences->Corresponding prediction vector +.>Aggregation is carried out to obtain a prediction vector set +.>
Aggregating prediction vectorsDialog with input->Corresponding true emotion class set +.>Solving cross entropy to obtain a loss function of the convolution joint model,the convolution joint model reaches a convergence state by minimizing the loss function;
the loss functionThe formula is as follows:
wherein ,for emotion total number (I)>Representing a convolution joint model prediction statement +.>Belongs to emotion category set +.>Middle->Personal emotion->Probability of->Representation sentence->Whether or not the actual emotion classification->If it belongs to->1, otherwiseIs 0.
The dialogue emotion recognition method based on the convolution joint model has the advantages that: according to the dialogue emotion recognition method based on the convolution joint model, provided by the structure, sentence theme characteristics are extracted by using the nerve theme model, so that dialogue sentences with similar themes can mutually use respective context information to perform joint reasoning, the problems of more jumps of the dialogue themes, information redundancy and the like are solved, and the representation quality of the speech sentences is improved; the relationship-driven statement information and the topic information are fused through the attention relationship graph network model, the enhanced feature representation is obtained, and the emotion recognition performance of the attention relationship graph network model on the dialogue is improved.
Drawings
FIG. 1 is a schematic diagram of the structure of the present invention;
FIG. 2 is a framework diagram of the construction of a convolution joint model.
Detailed Description
In the following detailed description of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The invention may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit or scope of the invention, which is therefore not limited to the specific embodiments disclosed below.
As shown in fig. 1 and 2, according to the dialog emotion recognition method based on the convolution joint model, sentences in a dialog are input into the trained convolution joint model so as to output emotion types corresponding to the sentences in the dialog; the convolution joint model comprises an editor, a neural topic model, an attention relation graph network model and a decoder which are connected in sequence, so that the emotion type is output through the decoder.
In the embodiment, an external topic knowledge is introduced by using a neural topic model (a main body is a variation self-encoder) to extract topic features of each sentence, and relational modeling among sentences, among topic features and among sentences and topic features is realized through a attention relation graph network model, so that information interaction among conversational sentences and characteristic representation of the conversational sentences are enhanced by fully utilizing implicit topic information of the sentences, a decoder is promoted to predict emotion categories, and the emotion categories are output by using a convolution joint model, wherein the method comprises the following steps of:
to facilitate a detailed description of the convolution joint model training process, the following symbol labels are introduced:
emotion category set as, wherein />Is->Emotion of (I)>Is the total emotion number;
the input dialog is noted as, wherein />Is the%>Statement->Is the total number of sentences in the dialogue;
input dialogThe corresponding set of true emotion categories is +.>, wherein Representation sentence->Whether or not to correspond to emotion category->
The network model for recording the attention relation graph is as followsWherein E is a node set in the attention relation graph network model, A is an adjacency matrix among nodes, R is a relation matrix among nodes, and the layer number of the attention relation graph network model is recorded as L.
The training process of the convolution joint model is as follows:
s1: building an input dialogThe input dialog is +.>Encoding to obtain input dialog->All sentences->Coding vector +.>,/>nIs the total number of sentences in the dialogue, and specifically comprises steps S11 to S12;
s11: will input a dialogueMiddle->The sentence is input into a Roberta encoder for encoding to obtain an input dialogue +.>Initial coding vector of all sentences in +.>
S12: initial coding vectors corresponding to all sentencesInput to->Network, get all sentences->Coding vector +.>, wherein />The network is an existing two-way long and short memory network;
inputting dialog through steps S11 to S12Middle->The individual statements are encoded. The Roberta encoder helps the BiLSTM (two-way long and short term memory network) model to better understand words and semantic information in sentences through deep text coding and rich semantic representation; the BiLSTM model captures the long-distance dependency relationship between the context structure and the sentences of the dialogue through the sequence modeling and the long-term dependency modeling capability; by combining the two (Roberta encoder and BiLSTM model), complex semantic relations in sentences can be processed better, so that the quality and the expression capacity of sentence coding are improved.
S2: using a priori parameters of the given dataset and />Extracting potential topic distribution->Training the neural topic model of the variational self-encoder in an unsupervised manner to obtain a topic feature matrix +.>, wherein /> and />Respectively representing the vocabulary size and the number of topics;
in addition, the given data set is input into a neural topic model of the variable self-encoder, and the neural topic model processes the given data set as follows, specifically comprising steps S21 to S4:
s21: in addition, given data set, input sentences in the given data setProcessing to obtain a single thermal code->One-time heat encoding->Delivering to a multi-layer perceptron to obtain said input sentence +.>Implicit representation of +.>
wherein ,representing statement +.>Corresponding one-hot coding,/->A multi-layer sensor is shown as such,,/>representing one-hot code->Implicit representation of->Representing +.>Performing single-heat coding on all words in the list;
it should be noted that the additional given data set and the construction of the input dialogNot belonging to the same training set, the further given data set may be expressed in particular as the further input sentence +.>,/>(/>) Representing input sentence +.>The word of (the word is specifically associated with the input sentence +.>The words in (a) are corresponding).
S22: based on implicit representationA priori parameters of the potential topic distribution Z> and />Estimation is performed from a priori parameters +.> and />Random decimation in the topic distribution Z of the representation>As said input sentence->Wherein the implicit representation +.>Is->Personal word->Implicit representation of the word->For inputting sentencesXThe words in (a) are specifically:
wherein , and />A priori parameters representing a given dataset, +.>Is a random variable, it being understood that,/-is>Is->Personal word->At->Implicit representation of the corresponding in (a);
s23: trainable theme feature matrixAnd trainable vector->Representation of potential topic->Performing linear transformation and->After the function operation, the predictive probability of the word is obtained>
wherein ,representing input sentence +.>Personal word->The prediction probability of each word in the corresponding vocabulary can be used for training parameter learning in a neural topic model based on a VAE (variable value) unsupervised mode; />For entering sentence +.>Personal word->The topic distribution of (2) can be used for participating in the subsequent training of parameter learning in a neural topic model based on a VAE (variable value) unsupervised mode;representing a trainable topic feature matrix, +.>Representing trainable vectors, optimizing +_during training of neural topic models> and />These two learnable parameters to bring the neural topic model to the desired output;
s24: training the neural topic model in an unsupervised manner and then obtaining a topic feature matrix,/>Each line is the +.>Personal word embedding->Each column is the +.>Personal topic embedding->
The given data set in step S21 is subjected to the data processing in steps S21 to S23 and then used as an input of the neural topic model in step S24, and the neural topic model is trained in an unsupervised manner.
wherein , and />Vocabulary size and number of topics, respectively, < >>Can be regarded as a word insert of a specific word (set +.>The individual word is embedded as +.>) Each dimension of the embedding corresponds to a probability value of the word with respect to a particular topic, and likewise, < >>Can be regarded as a topic embedding for a particular topic(let->The individual subject is embedded as->) Each dimension of which corresponds to a probability value that the topic is associated with a particular word.
S21 to S24, training a theme feature matrix of the theme model, wherein each row of the theme feature matrix obtained through training represents word embedding, and each column represents theme embedding, so that the characterization of the words and the characterization of the theme are naturally connected through the theme feature matrix; each element of the topic feature matrix describes the correlation between a certain pair of words and a topic, and the characterization of the words and the characterization of the topic are only the information of the topic feature matrix is reflected in different dimensions; by using the obtained topic feature matrix, a topic token corresponding to a certain word token (sentence token) can be obtained, and the value of each dimension of the topic token represents the probability value of the word (sentence) related to each topic.
S3: using topic feature matricesWill input dialogue->Chinese sentence->Is mapped to word codes, and the sentence ++is obtained by averaging pooling>Coding of->Based on the coding->Calculating the self-attention coefficient of each topic +.>Specifically, steps S31 to S33 are included:
s31: using topic feature matricesWill input dialogue->Chinese sentence->Mapping all words of (2) to word code +.>,/>For statement->The total number of midwords;
s32: encoding wordsPerforming average pooling to obtain sentence->Coding of->Statement->Is>The subject is code->In->Numerical value of individual dimension>Based on the logarithmic value of the multilayer sensor>Expanding the dimension to obtain an expansion vector
Due to codingWord code projected by all topic feature matrices +.>Averaging the pooled results, encoding +.>The word code +.>I.e. each dimension represents a correlation with a certain topic, i.e.: coding->Is sentence->Is represented by +.>Probability associated with the corresponding topic->I.e. +.>For vector representation, +.>Is represented by a numerical value;
wherein ,representing average pooling>Indicate->Personal statement and->Probability of being related to individual topic,/->Express logarithmic value +.>Expansion vector obtained after expansion of dimension of (2),. About.>Representation->Function (F)>Representing a multi-layer perceptron for mapping individual probability values into vectors of high dimensionality.
S33: based on expansion vectorAnd coding vector->Calculating the self-attention coefficient of each theme>
wherein ,representation->Function (F)>Representing a multi-layer perceptron.
S4: calculating a topic feature matrix using multi-layer perceptronsThe representation of each topic in (a) gives the topic memory representation +.>By means of the self-attention coefficient->Characterization of the subject memory>Aggregation is carried out to obtain statement +.>Subject characterization vector->
wherein ,for the topic feature matrix->The%>Personal topic embedding->Representing a multi-layered perceptron for embedding and mapping each topic as a vector + ->Dimension of->Representing the number of topics; it should be noted that->Related to the topic feature matrix, i.e. to +.>Related, and sentence code->Different.
Through the steps S2 to S4, training is carried out on the topic feature matrix in the neural topic model and the feature extraction (topic characterization vector) of the speaking topic is carried out, so that the effective training of the neural topic model is realized.
The steps S2 to S4 provide a mechanism for enhancing dialogue statement characterization by using the neural topic model, and sentence topic features are extracted by using the neural topic model, so that dialogue statements with similar topics can mutually use respective context information to perform joint reasoning, the problems of more jumps of dialogue topics, information redundancy and the like are solved, and the speech statement characterization quality is improved.
S5: will input a dialogueMiddle->Personal sentence and corresponding->The topic features are used as nodes of the attention relation graph network model, and the coding vector of the sentence is +.>And topic representation vector->A method for generating a attention relationship graph network model>The initial node representation, noted->Wherein when->When (I)>When (when)When (I)>
S6: obtaining an adjacency matrix between nodes according to the interconnection and action relation of different nodes in the attention relation graph network modelAnd relation matrix->And using the attention relationship graph network model to +.>And relation matrix->Modeling is performed in which->For any two nodes->Edge between, if two nodes +.>Connect with->1, otherwise->0->Representing edge->The marked values include in particular:
s61: obtaining an adjacency matrix between nodes according to the interconnection and action relation of different nodes in the attention relation graph network modelAnd relation matrix->
S62: based on adjacency matrixAnd relation matrix->To make full use of the relationships between sentences and subject features in a dialog, a attention relationship graph network model is utilized>For->Modeling of the initial nodes, wherein->Is +.>Are mapped to adjacency matrix->One element of->While element->Can be mapped to a relation matrix->One element of->
Wherein the relation matrixThe values of the elements in (a) are of three types: an utterance-utterance, a topic-topic, a topic-utterance, the three types being edge types of a network model of an attention relationship graph, each of the edge types comprising a different kind of value;
for edge types of words-words, according to node pairsCorresponding sentence pair->In dialogue->Whether or not adjacent, in conversation->Whether from the same speaker, resulting in eight types of values: adjacent-future-self, adjacent-future-other people, adjacent-past-self, adjacent-past-other people, away from-future-self, away from-future-other people, away from-past-self, away from-past-other people;
for the edge type as the theme-theme, according to the node pairCorresponding sentence pair->In dialogue->If adjacent, resulting in four types of values: adjacency-future, adjacency-past, distant-future, distant-past;
for the edge type to be a topic-utterance, a separate type of value is introduced: influence.
S7: based on adjacency matrixAnd relation matrix->Representing the node of the last layer output in the attention relation graph network model by +.>Sum sentence->The corresponding coding vector in step S1 +.>Performing adhesion to obtain sentence->Is (are) enhanced hidden vector->The enhancement hidden vector +.>Output of prediction vector by decoder>Selecting said predictive vector->Emotion category with highest score as sentence +.>Is used for predicting emotion classification;
characterization for each nodeWill be->Other nodes with connections->Is aggregated to a nodeIn, the updated node representation +.>
wherein ,representing node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>For node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>Is +.>A set of other nodes with connections, +.>Is a relation matrix->The starting node is->Is a set of possible values of ∈ ->Is the attention relation graph network model +.>Matrix for self-connection in layer, +.>Is->In the layer attention relation graph network model, the current node +.>In relation->For extracting other nodes under the condition->Matrix of information->For node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>For node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>Indicate->Node +.>For->Aggregate information coefficient,/->Representing node->And->Whether there is a connection between them, if there is a connection +.>0, otherwise->1->Representing node->And->Whether there is a connection between them, if there is a connection +.>0, otherwise->1->Representing a set of all nodes in the attention relationship graph network model; />Representation->Middle->Personal node->Representing the connection between tensors.
According to steps S5 to S7, the relationship-driven dialogue statement information and the theme information are fused through the attention relationship graph network model, the enhanced characteristic representation is obtained, and the emotion recognition performance of the attention relationship graph network model on the dialogue is improved.
And (3) combining the neural topic model and the attention relation diagram network model through the steps S1 to S7, wherein topic characterization vectors output by the neural topic model participate in initial node characterization of the attention relation diagram network model, so that the final convolution joint model can effectively realize identification accuracy and high efficiency of dialogue emotion.
In this embodiment, the convolution joint model is trained to converge to the optimal state by the loss function, specifically:
will input a dialogueAll sentences->Corresponding prediction vector +.>Aggregation is carried out to obtain a prediction vector set +.>
Aggregating prediction vectorsDialog with input->Corresponding true emotion class set +.>Obtaining cross entropy to obtain a loss function of the convolution joint model, and enabling the convolution joint model to reach a convergence state by minimizing the loss function;
the loss functionThe formula is as follows:
wherein For emotion total number (I)>Representing a convolution joint model prediction statement +.>Belongs to emotion category set +.>Middle->Personal emotion->Probability of->Representation sentence->Whether or not the actual emotion classification->If it belongs to->1, otherwiseIs 0.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims (10)

1. A dialog emotion recognition method based on a convolution joint model, wherein the convolution joint model comprises a neural topic model and an attention relationship graph network model, the dialog emotion recognition method comprises the following steps: inputting sentences in the dialogue into the trained convolution joint model to output emotion categories corresponding to the sentences in the dialogue;
the training process of the convolution joint model is as follows:
s1: building an input dialogThe input dialog is +.>Encoding to obtain input dialog->All sentences->Coding vector +.>,/>,/>Is the total number of sentences in the dialogue;
s2: using a priori parameters of a given dataset and />Extracting potential topic distribution->Training the neural topic model of the variational self-encoder in an unsupervised manner to obtain a topic feature matrix +.>, wherein /> and />Respectively representing the vocabulary size and the number of topics;
s3: using topic feature matricesWill input dialogue->Chinese sentence->Is mapped to word codes, and the sentence ++is obtained by averaging pooling>Coding of->Based on the coding->Calculate each masterAttention coefficient of questions->
S4: calculating a topic feature matrix using multi-layer perceptronsThe representation of each topic in the database to obtain the topic memory representationBy means of the self-attention coefficient->Characterization of the subject memory>Aggregation is carried out to obtain statement +.>Subject characterization vector->
S5: will input a dialogueMiddle->Personal sentence and corresponding->The topic features are used as nodes of the attention relation graph network model, and the coding vector of the sentence is +.>And topic representation vector->A method for generating a attention relationship graph network model>The initial node representation, noted->Wherein when->When (I)>When (when)When (I)>
S6: obtaining an adjacency matrix between nodes according to the interconnection and action relation of different nodes in the attention relation graph network modelAnd relation matrix->And using the attention relationship graph network model to +.>And relation matrix->Modeling is performed in which->For any two nodes->Edges therebetween, e.g.Fruit two nodes->Connect with->1, otherwise->0->Representing edge->The marked value;
s7: based on adjacency matrixAnd relation matrix->Representing the node of the last layer output in the attention relation graph network model by +.>Sum sentence->The corresponding coding vector in step S1 +.>Performing adhesion to obtain sentence->Is (are) enhanced hidden vector->The enhancement hidden vector +.>Output of prediction vector by decoder>Selecting said predictive vector->Emotion category with highest score as sentence +.>Is used for predicting emotion classification;
wherein a priori parameters of the given dataset are utilized and />Extracting potential topic distribution->Obtaining a theme feature matrixThe formula involved is specifically as follows:
wherein ,representing an input sentence in a given dataset>Middle->Personal word->The predicted probabilities for each word in the vocabulary correspond,representing an input sentence in a given dataset>Middle->Personal word->Subject distribution of->Representing a matrix of features of the subject matter that may be trained,representing trainable vectors, ++> and />A priori parameters representing a given dataset, +.>Is a random variable which is used to determine the random,representing a multi-layer sensor->Representing an input sentence in a given dataset>Corresponding one-hot coding,/->Representing one-hot code->Implicit representation of->Is->Personal word->Implicit representation of the word->For statement +.>The word(s) in (a) is (are),representing the input of a sentence in a given dataset>The single-hot encoding of all words in (a).
2. The dialog emotion recognition method based on the convolution joint model according to claim 1, wherein in step S1: building an input dialogThe input dialog is +.>Encoding to obtain input dialog->All sentences->Coding vector +.>Specifically, the method comprises the following steps:
will input a dialogueInput to Roberta encoder for encoding to obtain input dialogue->Initial coding vector of all sentences in +.>
Initial coding vectors corresponding to all sentencesInput to->Network, get all sentences->Is a coded vector of (a)
3. The dialog emotion recognition method based on the convolution joint model according to claim 1, characterized in that in step S2: using a priori parameters of the given dataset and />Extracting potential topic distribution->Training the neural topic model of the variational self-encoder in an unsupervised manner to obtain a topic feature matrix +.>Specifically, the method comprises the following steps:
in addition, the given data set is input into a neural topic model of the variable self-encoder, and the neural topic model processes the given data set as follows:
input sentences in given data setProcessing to obtain a single thermal code->One-time heat encoding->To transfer toThe multi-layer perceptron obtains the input sentence +.>Implicit representation of +.>
Based on implicit representationA priori parameters of the potential topic distribution Z> and />Estimation is performed from a priori parameters +.> and />Random decimation in the topic distribution Z of the representation>As said input sentence->Wherein the implicit representation +.>Is->Personal wordImplicit representation of the word->For inputting sentence +.>Words of (a);
trainable theme feature matrixAnd trainable vector->Representation of potential topic->Performing linear transformation and->After the function operation, the predictive probability of the word is obtained>
Training the neural topic model in an unsupervised manner and then obtaining a topic feature matrix,/>Each line is the +.>Personal word embedding->Each column is +.>Personal topic embedding->
4. The dialog emotion recognition method based on the convolution joint model according to claim 1, characterized in that in step S3: using topic feature matricesWill input dialogue->Chinese sentence->Is mapped to word codes, and the sentence ++is obtained by averaging pooling>Coding of->Based on the coding->Calculating the self-attention coefficient of each topic +.>Specifically, the method comprises the following steps:
using topic feature matricesWill input dialogue->Chinese sentence->Mapping all words of (2) to word code +.>,/>For statement->The total number of midwords;
encoding wordsPerforming average pooling to obtain sentence->Coding of->Statement->Is>The subject is code->In->Numerical value of individual dimension>Based on the logarithmic value of the multilayer sensor>The dimension of (2) is expanded to obtain an expansion vector +.>
Based on expansion vectorAnd coding vector->Calculating the self-attention coefficient of each theme>
5. The method for identifying dialog emotion based on convolution joint model according to claim 4, characterized in that said codeThe formula of (2) is as follows:
wherein ,representing average pooling>Representation->Function (F)>Indicate->Personal statement and->Probability of being related to individual topic,/->Express logarithmic value +.>An expansion vector is obtained after the dimension of (a) is expanded,representing a multi-layer perceptron for mapping individual probability values into vectors of high dimensionality.
6. The method for identifying dialog emotion based on convolution joint model of claim 4, wherein said topic memory is characterized byThe calculation formula of (2) is as follows:
wherein ,for the topic feature matrix->The%>Personal topic embedding->Representing a multi-layered perceptron for embedding and mapping each topic as a vector + ->Dimension of->Representing the number of topics.
7. The method for identifying dialog emotion based on convolution joint model according to claim 1, wherein in step S6, specifically comprising:
obtaining an adjacency matrix between nodes according to the interconnection and action relation of different nodes in the attention relation graph network modelAnd relation matrix->
Based on adjacency matrixAnd relation matrix->Network model using attention relationship graph>For->Modeling of the initial nodes, wherein->Is +.>Are mapped to adjacency matrix->One element of->While element->Can be mapped to a relation matrix->One element of->
The relation matrixThe values of the elements in (a) are of three types: an utterance-utterance, a topic-topic, a topic-utterance, the three types being edge types of a network model of an attention relationship graph, each of the edge types comprising a different kind of value;
for edge types of words-words, according to node pairsCorresponding sentence pair->In dialogue->Whether or not adjacent, in conversation->Whether or not to come from the order of the sequences in (From the same speaker, eight types of values are obtained: adjacent-future-self, adjacent-future-other people, adjacent-past-self, adjacent-past-other people, away from-future-self, away from-future-other people, away from-past-self, away from-past-other people;
for the edge type as the theme-theme, according to the node pairCorresponding sentence pair->In dialogue->If adjacent, resulting in four types of values: adjacency-future, adjacency-past, distant-future, distant-past;
for the edge type to be a topic-utterance, a separate type of value is introduced: influence.
8. The method for identifying dialog emotion based on convolution joint model according to claim 7, characterized in that in step S7, each node is characterized byWill be->Other nodes with connections->Information aggregation to node->In, the updated node representation +.>
wherein ,representing node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>For node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>Is +.>A set of other nodes with connections, +.>Is a relation matrix->The starting node is->Is a set of possible values of ∈ ->Is the attention relation graph network model +.>Matrix for self-connection in layer, +.>Is->In the layer attention relation graph network model, the current node +.>In relation->For extracting other nodes under the condition->Matrix of information->For node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>For node->In->Corresponding node representation in the layer attention relationship graph network model,/-for>Indicate->Node +.>For->The information coefficient is aggregated and the information coefficient is,representing node->And->Whether there is a connection between them, if there is a connection +.>0, otherwise->1->Representing nodesAnd->Whether there is a connection between them, if there is a connection +.>0, otherwise->1->Representing the set of all nodes in the attention relationship graph network model, +.>Representation->Middle->And each node.
9. The method for identifying dialog emotion based on convolution joint model of claim 7, wherein said enhanced hidden vectorThe calculation formula is as follows:
prediction vectorThe calculation formula of (2) is as follows:
wherein ,the value of each dimension represents a score for the emotion class represented by the corresponding dimension, +.> and />Is a predictable vector for enhancing the hidden vector +.>Dimension maps to the number of emotion categories, +.>Representing the connection between tensors.
10. The dialog emotion recognition method based on a convolution joint model according to claim 9, wherein the training of the convolution joint model to converge to an optimal state by a loss function is specifically:
will input a dialogueAll sentences->Corresponding prediction vector +.>Aggregation is carried out to obtain a prediction vector set +.>
Aggregating prediction vectorsDialog with input->Corresponding true emotion class set +.>Obtaining cross entropy to obtain a loss function of the convolution joint model, and enabling the convolution joint model to reach a convergence state by minimizing the loss function;
the loss functionThe formula is as follows:
wherein ,for emotion total number (I)>Representing a convolution joint model prediction statement +.>Belongs to emotion category set->Middle->Personal emotion->Probability of->Representation sentence->Whether or not the actual emotion classification->If it belongs to->1, otherwise->Is 0.
CN202310443460.0A 2023-04-24 2023-04-24 Dialogue emotion recognition method based on convolution joint model Active CN116258134B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310443460.0A CN116258134B (en) 2023-04-24 2023-04-24 Dialogue emotion recognition method based on convolution joint model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310443460.0A CN116258134B (en) 2023-04-24 2023-04-24 Dialogue emotion recognition method based on convolution joint model

Publications (2)

Publication Number Publication Date
CN116258134A CN116258134A (en) 2023-06-13
CN116258134B true CN116258134B (en) 2023-08-29

Family

ID=86679580

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310443460.0A Active CN116258134B (en) 2023-04-24 2023-04-24 Dialogue emotion recognition method based on convolution joint model

Country Status (1)

Country Link
CN (1) CN116258134B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200119410A (en) * 2019-03-28 2020-10-20 한국과학기술원 System and Method for Recognizing Emotions from Korean Dialogues based on Global and Local Contextual Information
CN112445898A (en) * 2019-08-16 2021-03-05 阿里巴巴集团控股有限公司 Dialogue emotion analysis method and device, storage medium and processor
WO2021132797A1 (en) * 2019-12-27 2021-07-01 한국과학기술원 Method for classifying emotions of speech in conversation by using semi-supervised learning-based word-by-word emotion embedding and long short-term memory model
WO2021139107A1 (en) * 2020-01-10 2021-07-15 平安科技(深圳)有限公司 Intelligent emotion recognition method and apparatus, electronic device, and storage medium
CN114385802A (en) * 2022-01-10 2022-04-22 重庆邮电大学 Common-emotion conversation generation method integrating theme prediction and emotion inference
CN114911932A (en) * 2022-04-22 2022-08-16 南京信息工程大学 Heterogeneous graph structure multi-conversation person emotion analysis method based on theme semantic enhancement
CN115600581A (en) * 2022-12-13 2023-01-13 中国科学技术大学(Cn) Controlled text generation method using syntactic information
CN115841119A (en) * 2023-02-21 2023-03-24 中国科学技术大学 Emotional cause extraction method based on graph structure

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200119410A (en) * 2019-03-28 2020-10-20 한국과학기술원 System and Method for Recognizing Emotions from Korean Dialogues based on Global and Local Contextual Information
CN112445898A (en) * 2019-08-16 2021-03-05 阿里巴巴集团控股有限公司 Dialogue emotion analysis method and device, storage medium and processor
WO2021132797A1 (en) * 2019-12-27 2021-07-01 한국과학기술원 Method for classifying emotions of speech in conversation by using semi-supervised learning-based word-by-word emotion embedding and long short-term memory model
WO2021139107A1 (en) * 2020-01-10 2021-07-15 平安科技(深圳)有限公司 Intelligent emotion recognition method and apparatus, electronic device, and storage medium
CN114385802A (en) * 2022-01-10 2022-04-22 重庆邮电大学 Common-emotion conversation generation method integrating theme prediction and emotion inference
CN114911932A (en) * 2022-04-22 2022-08-16 南京信息工程大学 Heterogeneous graph structure multi-conversation person emotion analysis method based on theme semantic enhancement
CN115600581A (en) * 2022-12-13 2023-01-13 中国科学技术大学(Cn) Controlled text generation method using syntactic information
CN115841119A (en) * 2023-02-21 2023-03-24 中国科学技术大学 Emotional cause extraction method based on graph structure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于神经主题模型的对话情感分析;王建成;徐扬;刘启元;吴良庆;李寿山;;中文信息学报(第01期);全文 *

Also Published As

Publication number Publication date
CN116258134A (en) 2023-06-13

Similar Documents

Publication Publication Date Title
CN108875807B (en) Image description method based on multiple attention and multiple scales
CN113254803B (en) Social recommendation method based on multi-feature heterogeneous graph neural network
CN109543180B (en) Text emotion analysis method based on attention mechanism
CN109284506B (en) User comment emotion analysis system and method based on attention convolution neural network
CN109472031B (en) Aspect level emotion classification model and method based on double memory attention
CN107145977B (en) Method for carrying out structured attribute inference on online social network user
CN112508085B (en) Social network link prediction method based on perceptual neural network
CN108363695B (en) User comment attribute extraction method based on bidirectional dependency syntax tree representation
CN111274375B (en) Multi-turn dialogue method and system based on bidirectional GRU network
WO2019165944A1 (en) Transition probability network based merchant recommendation method and system thereof
CN111274398A (en) Method and system for analyzing comment emotion of aspect-level user product
CN111400494B (en) Emotion analysis method based on GCN-Attention
CN114443827A (en) Local information perception dialogue method and system based on pre-training language model
CN113435211B (en) Text implicit emotion analysis method combined with external knowledge
CN112667818A (en) GCN and multi-granularity attention fused user comment sentiment analysis method and system
CN115841119B (en) Emotion cause extraction method based on graph structure
CN110472245B (en) Multi-label emotion intensity prediction method based on hierarchical convolutional neural network
CN109614611B (en) Emotion analysis method for fusion generation of non-antagonistic network and convolutional neural network
CN111444399B (en) Reply content generation method, device, equipment and readable storage medium
CN112347756A (en) Reasoning reading understanding method and system based on serialized evidence extraction
CN112612871A (en) Multi-event detection method based on sequence generation model
CN114625882B (en) Network construction method for improving unique diversity of image text description
CN110910235A (en) Method for detecting abnormal behavior in credit based on user relationship network
CN114036298A (en) Node classification method based on graph convolution neural network and word vector
Zhang et al. TS-GCN: Aspect-level sentiment classification model for consumer reviews

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant