CN116484004A - Dialogue emotion recognition and classification method - Google Patents

Dialogue emotion recognition and classification method Download PDF

Info

Publication number
CN116484004A
CN116484004A CN202310607292.4A CN202310607292A CN116484004A CN 116484004 A CN116484004 A CN 116484004A CN 202310607292 A CN202310607292 A CN 202310607292A CN 116484004 A CN116484004 A CN 116484004A
Authority
CN
China
Prior art keywords
emotion
sentence
node
statement
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310607292.4A
Other languages
Chinese (zh)
Other versions
CN116484004B (en
Inventor
徐博
李龙娇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202310607292.4A priority Critical patent/CN116484004B/en
Publication of CN116484004A publication Critical patent/CN116484004A/en
Application granted granted Critical
Publication of CN116484004B publication Critical patent/CN116484004B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a dialogue emotion recognition and classification method, which comprises the steps of preprocessing dialogue emotion recognition data sets; extracting sentence-level features from the sentences in the preprocessed dialogue emotion recognition data set to obtain sentence-level features; extracting emotion-induced events from sentences in the preprocessed dialogue emotion recognition data set to obtain semantic representation of the emotion-induced events; constructing an emotion-psychology characterization heterogeneous conversation map through sentence level features of sentences and semantic representation of emotion-induced events, and obtaining a feature matrix of each node input feature and an adjacent matrix of the edge connection relationship between the nodes; extracting session-level features of the feature matrix and the adjacent matrix through a graph encoder to obtain session-level features; and the full-connection sentence-level features and the conversation-level features are input into a feedforward neural network to obtain emotion classification. The invention can better identify emotion types in the dialogue and improve the accuracy of dialogue emotion identification.

Description

Dialogue emotion recognition and classification method
Technical Field
The invention belongs to the technical field of intelligent recognition, and particularly discloses a dialogue emotion recognition and classification method.
Background
Natural language is used as a main communication medium of human beings and plays a very important role in the emotion generation process, so emotion recognition has wide application prospect in the fields of opinion mining, social media, recommendation systems and the like. In recent years, in order to develop artificial intelligence capable of understanding human emotion, conversational Emotion Recognition (ERC) has attracted attention from natural language processing researchers, and has become a popular research field. Conversational emotion recognition aims at recognizing the emotion of each utterance in a conversation, which helps to generate emotion-aware conversations, and develop a co-emotion conversation agent or chat robot for psychological treatment. In addition, ERC has potential applications in social media clues, forensic mining, consumer feedback in real-time conversations, legal decisions, electronic health services medical systems, education, and the like. Conversational emotion recognition differs from traditional text such as sentences, documents, etc., and requires not only semantic information of the utterances themselves, but also context modeling of each utterance. Preliminary studies of conversational emotion recognition, such as vocabulary-based and deep learning-based emotion recognition research methods, ignore specific factors of conversations, such as contextual cues, chronological order of utterances, or specific information of speakers. Recent studies on conversational emotion recognition use a sequence-based or graph-based approach to simulate interactions between the context of the utterance and the speaker as much as possible. Based on the above study, researchers have incorporated variables affecting dialogue emotion, such as general knowledge, psychological knowledge, dialogue behavior, topics, etc., into models.
While previous studies have made tremendous progress in conversational emotion recognition tasks, the effect on emotion recognition of psychological characterization that triggers the generation of emotion by the speaker is ignored. The method of modeling mental states using an external common sense knowledge base is largely dependent on the size, coverage and construction quality of the knowledge base. While the pre-training model helps to generate common sense knowledge of unseen events of the knowledge base, the common sense of everyone in real life is not necessarily the same. Psychologist Rainer Reisenzein studies language-emotion interactions from the psycho-characterization theory of computable emotions, knowing that emotion depends not only on cognitive psychological characterization (i.e., cognitive or information states) but also on motivational psychological characterization (i.e., motivational states). It can be said that the psycho-emotional characterization theory on which the computable psycho-emotional characterization theory is based is itself an explanation of the emotion common sense psychology underlying the language. Humans have complex psychological characterizations from which it is known in daily conversational communication that a speaker "a psychological characterization has just been confirmed by a certain event" or "a psychological characterization has just been achieved by a certain event", so that emotion for a transactional state results from the confirmation or non-confirmation of a psychological characterization and satisfaction or frustration of a psychological characterization. It can be said that psychological characterization and emotion are not only causal but also semantically related to emotion: p, craving for p, and pleasure about p are believed to be a common event, all of which involve p. Therefore, the emotion classification method for the sentences by the heterogeneous graph neural network containing sentences, speakers and emotion-induced events based on the emotion psychological characterization theory can better identify the emotion in the dialogue.
Disclosure of Invention
The invention provides a dialogue emotion recognition and classification method for solving the problems that the existing dialogue emotion recognition and classification method ignores the effect of psychological characterization which causes emotion generated by a speaker on emotion recognition and has low classification precision.
The invention provides a dialogue emotion recognition and classification method, which comprises the following steps:
s1, preprocessing a dialogue emotion recognition data set, removing irrelevant information of sentences in the dialogue emotion recognition data set, and improving the quality of texts;
s2, extracting sentence-level features of the sentences in the dialogue emotion recognition data set after preprocessing in the step S1 to obtain sentence-level features of the sentences;
s3, extracting emotion triggering events from the sentences in the dialogue emotion recognition data set preprocessed in the step S1 to obtain semantic representations of the emotion triggering events;
s4, constructing a emotion-psychology characterization heterogeneous session graph through the sentence level features of the sentences obtained in the step S2 and the semantic representation of the emotion-induced events obtained in the step S3, and obtaining a feature matrix of each node input feature and an adjacent matrix of the edge connection relationship between the nodes;
s5, extracting session-level features of the feature matrix and the adjacent matrix obtained in the step S4 through a graph encoder to obtain session-level features;
s6, fully connecting the sentence-level features obtained in the step S2 and the session-level features obtained in the step S5, and inputting the fully connected result into a feedforward neural network to obtain emotion classification.
Still further, in the step S1, the dialog emotion recognition data set includes one or more of IEMOCAP, dailyDialog, MELD and EmoryNLP; the preprocessing includes one or more of removing abbreviations, removing non-alphabetic symbols, removing special symbols, removing abbreviations for proper nouns, and removing redundant space characters.
Furthermore, in the step S2, sentence-level feature extraction is performed on the sentences in the dialog emotion recognition data set using a language model Roberta-Large, and each sentence u in the dialog emotion recognition data set i Beginning of add mark [ CLS ]]Let the input sequence be [ CLS ]],w 1 ,w 2 ,···,w L And input into Roberta to obtain sentence-level featuresAs shown in formula (1):
wherein w is L Representation sentence u i Is the L-th word of (2).
Further, the language model Roberta-Large architecture used in the step S2 is 24 layers, each block has 16 self-care heads, the hidden dimension is 1024, and there are 355M parameters in total.
Further, the step S3 includes:
s301, dividing sentences in the dialogue emotion recognition data set into simple clauses according to subordinate conjunctions and conjunctions;
s302, designing a plurality of event modes to match and extract emotion guidingEvent, find sentence u i The method comprises the steps of including the dependency relationship of simple clauses of each verb v, then matching the dependency relationship with a plurality of designed event modes one by one, and for each mode, taking the verb v as a starting point, finding all positive dependency relationship sides, wherein the positive dependency relationship sides and words connected by the positive dependency relationship sides are potential sides and words of effective emotion triggering events;
s303, adding optional dependency edges and words connected by the optional dependency edges through the dependency to form a dependency graph;
s304, checking whether a negative dependency relationship side can be found in the dependency graph, if not, reserving the current dependency relationship side and the word as effective emotion triggering events, otherwise, not reserving;
s305, encoding the extracted effective emotion-induced events by using RoBERTa, and obtaining the maximum pooling of the last layer of hidden states to obtain semantic representation of the emotion-induced eventsAs shown in formula (2):
where e is a valid emotion-inducing event.
Further, the step S4 includes:
s401, constructing statement nodes, speaker nodes and emotion triggering event nodes of an emotion-psychological characterization heterogeneous session map; each target sentence in the dialogue is used as a sentence node, and the characteristics of the sentence node are initializedIs a statement level feature of a statement, as shown in equation (3):
each speech in the dialogueWith a speaker as a speaker node, feature initialization of the speaker nodeThe average of the semantic features of all the expressed sentences of the speaker in the conversation is shown as formula (4):
wherein avg () is an averaging function;
using the emotion-induced event extracted from each sentence in the dialogue as an emotion-induced event node, and initializing the characteristics of the emotion-induced event nodeAs shown in formula (5):
the node set is shown in formula (6):
V=u i ∪Unique(s j )∪Unique(e y ) (6)
wherein V is a node set, u i Is the ith statement node, s j For the j-th speaker node, e y For the y-th emotion-induced event node, unique () is a deduplication function;
s402, constructing statement-statement edges, statement-speaker edges and statement-emotion triggering event edges of emotion-psychological representation heterogeneous session diagrams; connecting each target sentence with the last sentence of all speakers before it, the sentence-sentence edge modeling the effect of past sentences on the current sentence, the sentence-sentence edge E uu As shown in formula (7):
E uu =(u i ,u t ),t>i (7)
wherein u is i For the ith statement node, u t Is the t statement node;
concatenating each target statement with theA speaker corresponding to a sentence, the sentence-speaker edge modeling the effect of the speaker on the sentence, the sentence-speaker edge E su As shown in formula (8):
E su =(s j ,u i ) (8)
wherein s is j For the j-th speaker node, u i Is the ith statement node;
connecting each target sentence with an emotion-induced event extracted from the sentence, the sentence-emotion-induced event edge modeling the influence of psychometric information of the emotion-induced event on the emotion, the sentence-emotion-induced event edge being as shown in formula (9):
E eu =(e t ,u i ) (9)
wherein e t For the t-th emotion-induced event node, u i Is the ith statement node;
the edge set is shown in formula (10):
E=E uu ∪E su ∪E eu (10)
wherein E is an edge set;
s403, after constructing the heterogeneous session graph, obtaining a feature matrix X representing the input features of each node and an adjacency matrix { A } of the connection relationship of the edges between the nodes k The feature matrix X is an N X d-dimensional matrix formed by the features of each node, N is the number of all nodes, d is the feature vector dimension of each node, and the adjacent matrix { A } k N x N dimensional matrix set representing edge relationships between nodes, A k Is the adjacency matrix of the kth dependency edge.
Further, the step S5 includes:
s501, flexibly selecting an adjacent matrix from an adjacent matrix set A of a heterogeneous graph G by using a layer-I graph conversion layer, and passing through two selected adjacent matrices A 1 And A 2 Learning a new primitive path graph, learning different node representations through a plurality of different heterogeneous graph G structures, learning a plurality of primitive path graphs after stacking l-layer graph conversion layers, and performing graph convolution neural on each primitive path graphThe network carries out graph convolution, and the propagation mode between layers of the graph convolution neural network is shown as a formula (11):
wherein X is (l+1) For the feature matrix of layer l +1, sigma is a nonlinear activation function,a is an adjacent matrix, I is an identity matrix,>is->W is a trainable weight matrix sharing a cross-channel, W ε R d×d Is a real matrix of d x d dimensions, d being the feature vector dimension of each node;
s502, fully connecting a plurality of node representations from the same graph convolution neural network on a plurality of element path graphs to obtain session-level features of sentence nodes, wherein the session-level features are shown in a formula (12):
wherein H is a session level feature, ||is a full connection operation, C is the number of channels,is from->Adjacent matrix of the ith channel, +.>For the adjacency matrix of the first layer, +.>Is->W is a trainable weight matrix sharing a cross-channel.
Further, in the step S6, the sentence-level feature and the session-level feature of the sentence node are fully connected, and the formula (13) is shown as follows:
wherein, the I is the full connection operation,for statement u i Statement level features of H i For statement u i Is a session-level feature of (c),
inputting the result obtained by full connection into a feedforward neural network, training and optimizing the feedforward neural network by using a cross entropy loss function and an Adam optimizer, and finally obtaining an emotion classification result, wherein the emotion classification result is shown as a formula (14) and a formula (15):
p x,i =Softmax(W z z i +b z ) (14)
y x,i =Argmax(p x,i ) (15)
wherein y is x,i Is the actual label of the ith statement in dialog x, z i For the last statement representation, W z And b z As a trainable parameter, p x,i Is the predictive probability distribution for the emotion tags of the ith sentence in dialog x.
According to the dialogue emotion recognition and classification method provided by the invention, semantic information of sentences, context and sequence information of dialogue, global information of a speaker and psychological characterization related to emotion-induced events are modeled, the effect of the psychological characterization of the emotion-induced speaker on emotion recognition is considered, so that the emotion category of dialogue sentences can be better recognized, and the accuracy of dialogue emotion recognition is improved.
Drawings
Fig. 1 is a flow chart of a dialog emotion recognition and classification method according to the present invention.
Detailed Description
Embodiments of the present invention are described in further detail below with reference to the accompanying drawings and examples. The following examples are illustrative of the invention but are not intended to limit the scope of the invention.
A dialogue emotion recognition and classification method is shown in fig. 1, and comprises the following steps:
s1, preprocessing a dialogue emotion recognition data set, removing irrelevant information of sentences in the dialogue emotion recognition data set, and improving the quality of a text;
in particular, the dialog emotion recognition dataset includes one or more of IEMOCAP, dailyDialog, MELD and EmoryNLP; the preprocessing includes one or more of removing abbreviations, removing non-alphabetic symbols, removing special symbols, removing abbreviations for proper nouns, and removing redundant space characters.
S2, extracting sentence-level features of the sentences in the dialogue emotion recognition data set preprocessed in the step S1 to obtain sentence-level features of the sentences;
specifically, the RoBERTa-Large is used for extracting sentence-level features of sentences in the dialogue emotion recognition data set, and the method is specifically implemented as follows: the sentences in the dialogue emotion recognition data set are stored in dictionary types, and text and speaker are respectively corresponding to sentence texts and corresponding speaker names in keys of each sentence in the dialogue. Each sentence u in the dialog emotion recognition dataset using the language model Roberta-Large i Beginning of add mark [ CLS ]]Let the input sequence be [ CLS ]],w 1 ,w 2 ,···,w L Inputting the semantic level representation of the Roberta-Large extraction statement by using a transformer library of Huggingface, and performing word segmentation pretreatment by using a RobertaTokenizer class and a from_preimpregnated () method to return a PyTorch tensor; output using RobertaModel class and from_preimpregnated method to obtain [ CLS ]]The sentence-level features serving as sentences are embedded in the pooling of 1024-dimensional hidden states of the last layer to obtain the sentence-level featuresAs shown in formula (1):
wherein w is L Representation sentence u i Is the L-th word of (2).
The language model Roberta-Large architecture used is 24 layers, with 16 self-attention heads in each block, a hidden dimension of 1024, and a total of 355M parameters.
S3, extracting emotion-induced events from sentences in the dialogue emotion recognition data set preprocessed in the step S1 to obtain semantic representation of the emotion-induced events;
speech communication is the primary source of general and specific background psychological characterizations required to calculate a psychological characterization of a specific emotion-causing event, which is semantically related to emotion, and thus extracting the causing emotional event from the target utterance to determine the confirmation or non-confirmation of the psychological characterization and satisfaction or frustration of the psychological characterization. To ensure that all extracted emotion-induced events are semantically complete and not overly complex, 18 designed event patterns are used to extract emotion-induced events by pattern matching, the 18 event patterns are shown in table 1:
TABLE 1 18 event patterns
Each pattern contains three dependency edges: positive dependency edges, optional dependency edges, and negative dependency edges. The other six dependencies advmod, amod, nummod, aux, compound and neg are optional dependency edges that can be associated with any selected pattern. All positive and optional dependency edges are considered negative dependency edges, with the aim that all emotion-induced events extracted are semantically complete and all patterns are mutually exclusive independent of each other.
Specifically, step S3 includes:
s301, considering that a sentence possibly contains a plurality of emotion triggering events, dividing the sentence into simple clauses according to a component tree, and following an utterance analysis system to detect possible separators by using a connection classifier, thereby dividing the sentence in the dialogue emotion recognition data set into simple clauses according to subordinate conjunctions and conjunctions;
s302, designing a plurality of event modes to match and extract emotion-induced events, and finding a sentence u i The method comprises the steps of including the dependency relationship of simple clauses of each verb v, then matching the dependency relationship with a plurality of designed event modes one by one, and for each mode, taking the verb v as a starting point, finding all positive dependency relationship sides, wherein the positive dependency relationship sides and words connected by the positive dependency relationship sides are potential sides and words of effective emotion triggering events;
s303, adding optional dependency edges and words connected by the optional dependency edges through the dependency to form a dependency graph;
s304, checking whether a negative dependency relationship side can be found in the dependency graph, if not, reserving the current dependency relationship side and the word as effective emotion triggering events, otherwise, not reserving;
s305, encoding the extracted effective emotion-induced events by using RoBERTa, and obtaining the maximum pooling of the last layer of hidden states to obtain semantic representation of the emotion-induced eventsAs shown in formula (2):
where e is a valid emotion-inducing event.
S4, constructing an emotion-psychology characterization heterogeneous session graph through the sentence level features of the sentences obtained in the step S2 and the semantic representation of the emotion-induced events obtained in the step S3, and obtaining a feature matrix of each node input feature and an adjacent matrix of the edge connection relationship between the nodes;
specifically, step S4 includes:
s401, constructing statement nodes, speaker nodes and emotion triggering event nodes of an emotion-psychological characterization heterogeneous session map; each target sentence in the dialogue is used as a sentence node, and the characteristics of the sentence node are initializedIs a statement level feature of a statement, as shown in equation (3):
each speaker in the conversation is used as a speaker node, and the characteristics of the speaker node are initializedThe average of the semantic features of all the expressed sentences of the speaker in the conversation is shown as formula (4):
wherein avg () is an averaging function;
using the emotion-induced event extracted from each sentence in the dialogue as an emotion-induced event node, and initializing the characteristics of the emotion-induced event nodeAs shown in formula (5):
the node set is shown in formula (6):
V=u i ∪Unique(s j )∪Unique(e y ) (6)
wherein V is a node set, u i Is the ith statement node, s j For the j-th speaker node, e y For the y-th emotion-induced event node, unique () is a deduplication function;
s402, constructing statement-statement edges, statement-speaker edges and statement-emotion triggering event edges of emotion-psychological representation heterogeneous session diagrams; we consider that the last sentence of each speaker before the target sentence has the greatest effect on the context of the target sentence, and that the other effects are smaller, and furthermore, it is notable that the edges between sentences are unidirectional, and that in real life, the emotion expressed by the current utterance is only affected by the previous utterance. So when constructing a statement-statement edge, connect each target statement with the last statement of all speakers before it, the statement-statement edge models the effect of past statements on the current statement, statement-statement edge E uu As shown in formula (7):
E uu =(u i ,u t ),t>i (7)
wherein u is i For the ith statement node, u t Is the t statement node;
connecting each target sentence and the corresponding speaker of the sentence, modeling the influence of the speaker on the sentence by the sentence-speaker side, and modeling the effect of the speaker on the sentence-speaker side E su As shown in formula (8):
E su =(s j ,u i ) (8)
wherein s is j For the j-th speaker node, u i Is the ith statement node;
connecting each target sentence with the emotion-induced event extracted from the sentence, and modeling the influence of psychological characterization information of the emotion-induced event on the emotion by the sentence-emotion-induced event edge, wherein the sentence-emotion-induced event edge is shown in a formula (9):
E eu =(e t ,u i ) (9)
wherein e t For the t-th emotion-induced event node, u i Is the ith statement node;
the edge set is shown in formula (10):
E=E uu ∪E su ∪E eu (10)
wherein E is an edge set;
s403, after constructing the heterogeneous session graph, obtaining a feature matrix X representing the input features of each node and an adjacency matrix { A } of the connection relationship of the edges between the nodes k The feature matrix X is an N X d-dimensional matrix formed by the features of each node, N is the number of all nodes, d is the feature vector dimension of each node, and the adjacent matrix { A } k N x N dimensional matrix set representing edge relationships between nodes, A k Is the adjacency matrix of the kth dependency edge.
S5, extracting session-level features of the feature matrix and the adjacent matrix obtained in the step S4 through a graph encoder to obtain session-level features;
specifically, step S5 includes:
s501, constructing Graph Transformer Network by using PyToch frame, setting C convolved output channels to simultaneously consider multiple element paths, flexibly selecting an adjacent matrix from an adjacent matrix set A of a heterogeneous graph G by using a layer-I graph conversion layer, and passing through two selected adjacent matrices A 1 And A 2 Learning a new element path diagram, learning different node representations through a plurality of different heterogeneous diagram G structures, learning a plurality of element path diagrams after stacking l layers of diagram conversion layers, and performing diagram convolution on a diagram convolution neural network on each element path diagram, wherein the propagation mode between the layers of the diagram convolution neural network is shown as a formula (11):
wherein X is (l+1) For the feature matrix of layer l +1, sigma is a nonlinear activation function,a is an adjacent matrix, I is an identity matrix,>is->W is a trainable weight matrix sharing a cross-channel, W ε R d×d Is a real matrix of d x d dimensions, d being the feature vector dimension of each node;
s502, fully connecting a plurality of node representations from the same graph convolution neural network on a plurality of element path graphs to obtain session-level features of sentence nodes, wherein the session-level features are shown in a formula (12):
wherein H is a session level feature, ||is a full connection operation, C is the number of channels,is from->Adjacent matrix of the ith channel, +.>For the adjacency matrix of the first layer, +.>Is->W is a trainable weight matrix sharing a cross-channel.
S6, fully connecting the sentence-level features obtained in the step S2 and the session-level features obtained in the step S5, and inputting the fully connected result into a feedforward neural network to obtain emotion classification.
Specifically, the full connection statement level feature and the session level feature of the statement node are shown in formula (13):
wherein, the I is the full connection operation,for statement u i Statement level features of H i For statement u i Is a session-level feature of (c),
inputting the result obtained by full connection into a feedforward neural network, training and optimizing the feedforward neural network by using a cross entropy loss function and an Adam optimizer, and finally obtaining an emotion classification result, wherein the emotion classification result is shown as a formula (14) and a formula (15):
p x,i =Softmax(W z z i +b z ) (14)
Y x,i =Argmax(p x,i ) (15)
wherein y is x,i Is the actual label of the ith statement in dialog x, z i For the last statement representation, W z And b z As a trainable parameter, p x,i Is the predictive probability distribution for the emotion tags of the ith sentence in dialog x.
Training the training set data by using the method of the embodiment, verifying the test results of the verification set and the test set, and comparing the method of the embodiment with a sequence-based dialog emotion recognition and classification method DialogueRNN, a graph-based dialog emotion recognition and classification method RGAT-POS, DAG-ERC, and common sense knowledge-based dialog emotion recognition algorithms COSMIC, SKAIG-ERC to obtain the results of table 2:
table 2 comparison of experimental results of different algorithms
The evaluation of the performance of the method on the data sets IEMOCAP and EmoryNLP using weighted-F1 as the evaluation index, with higher weighted-F1 values indicating better classification of the samples by the method, the method of this example achieved better results than DialogueRNN, RGAT-POS, DAG-ERC, COSMIC, SKAIG-ERC on both data sets at IEMOCAP, emoryNLP weighted-F1 values, 0.19 higher on the data set IEMOCAP than the best algorithm DAG-ERC, and 1.3 higher on the data set EmoryNLP than the best algorithm DAG-ERC. Since DailyDialog data sets contain a large amount of data with neutral labels, the evaluation of the method performance is performed on DailyDialog by using micro-F1 excluding sentences marked as neutral as an evaluation index, micro-F1 is a proportion of calculating correct classification in all samples, the higher the micro-F1 value is, the better the effect of the method on classifying the samples is, the micro-F1 value on the DailyDialog of the method in the embodiment obtains a result superior to DialogueRNN, RGAT-POS and DAG-ERC, COSMIC, SKAIG-ERC, and the best algorithm SKAIG-ERC on the DailyDialog is 0.05 higher. Therefore, it can be seen that the method provided by the embodiment can obtain better classification effect, i.e. the emotion recognition and classification result is more accurate.
The method of the embodiment is inspired by a psychological characterization theory of emotion in psychology, takes the effect of psychological characterization on emotion into consideration, preprocesses dialogue data, extracts sentence-level features by using a pre-training model, extracts emotion-induced events in the sentences, constructs the sentences, speakers and emotion-induced events in the sentences into context interaction, sequence information, speaker information and psychological characterization information of emotion-induced events of a heterogeneous graph modeling dialogue, extracts the sentence-level features by using a graph transformer network, and finally classifies the emotion by combining the sentence-level features and the conversation-level features of the sentences, thereby realizing emotion recognition of the sentences in the dialogue. Semantic information of sentences, context and sequence information of conversations, specific information of speakers and psychological characterization related to emotion-induced events are modeled, emotion types of the sentences in the conversations can be better identified, and accuracy of conversation emotion identification results is guaranteed.
The embodiments of the invention have been presented for purposes of illustration and description, and are not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (8)

1. A dialog emotion recognition and classification method, characterized by comprising the steps of:
s1, preprocessing a dialogue emotion recognition data set, and removing irrelevant information of sentences in the dialogue emotion recognition data set;
s2, extracting sentence-level features of the sentences in the dialogue emotion recognition data set after preprocessing in the step S1 to obtain sentence-level features of the sentences;
s3, extracting emotion triggering events from the sentences in the dialogue emotion recognition data set preprocessed in the step S1 to obtain semantic representations of the emotion triggering events;
s4, constructing a emotion-psychology characterization heterogeneous session graph through the sentence level features of the sentences obtained in the step S2 and the semantic representation of the emotion-induced events obtained in the step S3, and obtaining a feature matrix of each node input feature and an adjacent matrix of the edge connection relationship between the nodes;
s5, extracting session-level features of the feature matrix and the adjacent matrix obtained in the step S4 through a graph encoder to obtain session-level features;
s6, fully connecting the sentence-level features obtained in the step S2 and the session-level features obtained in the step S5, and inputting the fully connected result into a feedforward neural network to obtain emotion classification.
2. A method of classifying a dialog emotion recognition as claimed in claim 1, wherein in step S1, the dialog emotion recognition data set includes one or more of IEMOCAP, dailyDialog, MELD and EmoryNLP; the preprocessing includes one or more of removing abbreviations, removing non-alphabetic symbols, removing special symbols, removing abbreviations for proper nouns, and removing redundant space characters.
3. A conversation as claimed in claim 1The method for classifying the emotion recognition is characterized in that in the step S2, sentence-level feature extraction is performed on sentences in the dialogue emotion recognition data set by using a language model Roberta-Large, and each sentence u in the dialogue emotion recognition data set is extracted i Beginning of add mark [ CLS ]]Let the input sequence be [ CLS ]],w 1 ,w 2 ,···,w L And input into Roberta to obtain sentence-level featuresAs shown in formula (1):
wherein w is L Representation sentence u i Is the L-th word of (2).
4. A dialog emotion recognition and classification method according to claim 3, characterized in that the language model Roberta-Large architecture used in step S2 is 24 layers, each block has 16 self-attention heads, the hidden dimension is 1024, and there are 355M parameters in total.
5. A dialog emotion recognition and classification method in accordance with claim 3, characterized in that said step S3 comprises:
s301, dividing sentences in the dialogue emotion recognition data set into simple clauses according to subordinate conjunctions and conjunctions;
s302, designing a plurality of event modes to match and extract emotion-induced events, and finding a sentence u i The method comprises the steps of including the dependency relationship of simple clauses of each verb v, then matching the dependency relationship with a plurality of designed event modes one by one, and for each mode, taking the verb v as a starting point, finding all positive dependency relationship sides, wherein the positive dependency relationship sides and words connected by the positive dependency relationship sides are potential sides and words of effective emotion triggering events;
s303, adding optional dependency edges and words connected by the optional dependency edges through the dependency to form a dependency graph;
s304, checking whether a negative dependency relationship side can be found in the dependency graph, if not, reserving the current dependency relationship side and the word as effective emotion triggering events, otherwise, not reserving;
s305, encoding the extracted effective emotion-induced events by using RoBERTa, and obtaining the maximum pooling of the last layer of hidden states to obtain semantic representation of the emotion-induced eventsAs shown in formula (2):
where e is a valid emotion-inducing event.
6. The method of claim 5, wherein the step S4 includes:
s401, constructing statement nodes, speaker nodes and emotion triggering event nodes of an emotion-psychological characterization heterogeneous session map; each target sentence in the dialogue is used as a sentence node, and the characteristics of the sentence node are initializedIs a statement level feature of a statement, as shown in equation (3):
each speaker in the conversation is used as a speaker node, and the characteristics of the speaker node are initializedThe average of the semantic features of all the expressed sentences of the speaker in the conversation is shown as formula (4):
wherein avg () is an averaging function;
using the emotion-induced event extracted from each sentence in the dialogue as an emotion-induced event node, and initializing the characteristics of the emotion-induced event nodeAs shown in formula (5):
the node set is shown in formula (6):
V=u i ∪Unique(s j )∪Unique(e y ) (6)
wherein V is a node set, u i Is the ith statement node, s j For the j-th speaker node, e y For the y-th emotion-induced event node, unique () is a deduplication function;
s402, constructing statement-statement edges, statement-speaker edges and statement-emotion triggering event edges of emotion-psychological representation heterogeneous session diagrams; connecting each target sentence with the last sentence of all speakers before it, the sentence-sentence edge modeling the effect of past sentences on the current sentence, the sentence-sentence edge E uu As shown in formula (7):
E uu =(u i ,u t ),t>i (7)
wherein u is i For the ith statement node, u t Is the t statement node;
connecting each target sentence with the speaker corresponding to the sentence, the sentence-speaker side modeling the speakerInfluence on sentence, the sentence-speaker edge E su As shown in formula (8):
E su =(s j ,u i ) (8)
wherein s is j For the j-th speaker node, u i Is the ith statement node;
connecting each target sentence with an emotion-induced event extracted from the sentence, the sentence-emotion-induced event edge modeling the influence of psychometric information of the emotion-induced event on the emotion, the sentence-emotion-induced event edge being as shown in formula (9):
E eu =(e t ,u i ) (9)
wherein e t For the t-th emotion-induced event node, u i Is the ith statement node;
the edge set is shown in formula (10):
E=E uu ∪E su ∪E eu (10)
wherein E is an edge set;
s403, after constructing the heterogeneous session graph, obtaining a feature matrix X representing the input features of each node and an adjacency matrix { A } of the connection relationship of the edges between the nodes k The feature matrix X is an N X d-dimensional matrix formed by the features of each node, N is the number of all nodes, d is the feature vector dimension of each node, and the adjacent matrix { A } k N x N dimensional matrix set representing edge relationships between nodes, A k Is the adjacency matrix of the kth dependency edge.
7. The method of classifying dialogue emotion recognition according to claim 6, wherein said step S5 includes:
s501, flexibly selecting an adjacent matrix from an adjacent matrix set A of a heterogeneous graph G by using a layer-I graph conversion layer, and passing through two selected adjacent matrices A 1 And A 2 Learning a new element path diagram, learning different node representations through a plurality of different heterogeneous diagram G structures, and after stacking l-layer diagram conversion layers, learning a plurality of element path diagramsPerforming graph convolution on each element path graph by using a graph convolution neural network, wherein the propagation mode between layers of the graph convolution neural network is as shown in a formula (11):
wherein X is (l+1) For the feature matrix of layer l +1, sigma is a nonlinear activation function,a is an adjacent matrix, I is an identity matrix,>is->W is a trainable weight matrix sharing a cross-channel, W ε R d×d Is a real matrix of d x d dimensions, d being the feature vector dimension of each node;
s502, fully connecting a plurality of node representations from the same graph convolution neural network on a plurality of element path graphs to obtain session-level features of sentence nodes, wherein the session-level features are shown in a formula (12):
wherein H is a session level feature, ||is a full connection operation, C is the number of channels,is from->Adjacent matrix of the ith channel, +.>For the adjacency matrix of the first layer, +.>Is->W is a trainable weight matrix sharing a cross-channel.
8. The method of claim 7, wherein in the step S6, the sentence-level features and the conversation-level features of the sentence nodes are fully connected, and formula (13) is shown as follows:
wherein, the I is the full connection operation,for statement u i Statement level features of H i For statement u i Is a session-level feature of (c),
inputting the result obtained by full connection into a feedforward neural network, training and optimizing the feedforward neural network by using a cross entropy loss function and an Adam optimizer, and finally obtaining an emotion classification result, wherein the emotion classification result is shown as a formula (14) and a formula (15):
p x,i =Softmax(W z z i +b z ) (14)
Y x,i =Argmax(p x,i ) (15)
wherein y is x,i Is the actual label of the ith statement in dialog x, z i For the last statement representation, W z And b z As a trainable parameter, p x,i Is the predictive probability distribution for the emotion tags of the ith sentence in dialog x.
CN202310607292.4A 2023-05-26 2023-05-26 Dialogue emotion recognition and classification method Active CN116484004B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310607292.4A CN116484004B (en) 2023-05-26 2023-05-26 Dialogue emotion recognition and classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310607292.4A CN116484004B (en) 2023-05-26 2023-05-26 Dialogue emotion recognition and classification method

Publications (2)

Publication Number Publication Date
CN116484004A true CN116484004A (en) 2023-07-25
CN116484004B CN116484004B (en) 2024-06-07

Family

ID=87227059

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310607292.4A Active CN116484004B (en) 2023-05-26 2023-05-26 Dialogue emotion recognition and classification method

Country Status (1)

Country Link
CN (1) CN116484004B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050144013A1 (en) * 2003-11-20 2005-06-30 Jun Fujimoto Conversation control apparatus, conversation control method, and programs therefor
CN106874410A (en) * 2017-01-22 2017-06-20 清华大学 Chinese microblogging text mood sorting technique and its system based on convolutional neural networks
CN114722838A (en) * 2022-04-11 2022-07-08 天津大学 Conversation emotion recognition method based on common sense perception and hierarchical multi-task learning
CN114911932A (en) * 2022-04-22 2022-08-16 南京信息工程大学 Heterogeneous graph structure multi-conversation person emotion analysis method based on theme semantic enhancement
WO2022183138A2 (en) * 2021-01-29 2022-09-01 Elaboration, Inc. Automated classification of emotio-cogniton

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050144013A1 (en) * 2003-11-20 2005-06-30 Jun Fujimoto Conversation control apparatus, conversation control method, and programs therefor
CN106874410A (en) * 2017-01-22 2017-06-20 清华大学 Chinese microblogging text mood sorting technique and its system based on convolutional neural networks
WO2022183138A2 (en) * 2021-01-29 2022-09-01 Elaboration, Inc. Automated classification of emotio-cogniton
CN114722838A (en) * 2022-04-11 2022-07-08 天津大学 Conversation emotion recognition method based on common sense perception and hierarchical multi-task learning
CN114911932A (en) * 2022-04-22 2022-08-16 南京信息工程大学 Heterogeneous graph structure multi-conversation person emotion analysis method based on theme semantic enhancement

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘喜凯等: ""基于检索结果融合机制的对话生成模型"", 《中文信息学报》, vol. 35, no. 7, 15 July 2021 (2021-07-15), pages 134 - 142 *

Also Published As

Publication number Publication date
CN116484004B (en) 2024-06-07

Similar Documents

Publication Publication Date Title
CN109493166B (en) Construction method for task type dialogue system aiming at e-commerce shopping guide scene
CN111368996B (en) Retraining projection network capable of transmitting natural language representation
CN110825881B (en) Method for establishing electric power knowledge graph
Ghosh et al. Fracking sarcasm using neural network
CN110222163B (en) Intelligent question-answering method and system integrating CNN and bidirectional LSTM
CN107025284A (en) The recognition methods of network comment text emotion tendency and convolutional neural networks model
CN109726745B (en) Target-based emotion classification method integrating description knowledge
CN111581350A (en) Multi-task learning, reading and understanding method based on pre-training language model
CN111325029A (en) Text similarity calculation method based on deep learning integration model
CN113254675B (en) Knowledge graph construction method based on self-adaptive few-sample relation extraction
CN111581379B (en) Automatic composition scoring calculation method based on composition question-deducting degree
Solomon et al. Understanding the psycho-sociological facets of homophily in social network communities
CN116049387A (en) Short text classification method, device and medium based on graph convolution
CN112685550A (en) Intelligent question answering method, device, server and computer readable storage medium
CN111159405B (en) Irony detection method based on background knowledge
CN112215629B (en) Multi-target advertisement generating system and method based on construction countermeasure sample
Lin et al. Predicting performance outcome with a conversational graph convolutional network for small group interactions
CN116756347B (en) Semantic information retrieval method based on big data
Nair et al. Knowledge graph based question answering system for remote school education
Sheeba et al. A fuzzy logic based on sentiment classification
CN113239143A (en) Power transmission and transformation equipment fault processing method and system fusing power grid fault case base
CN115905187B (en) Intelligent proposition system oriented to cloud computing engineering technician authentication
CN116484004B (en) Dialogue emotion recognition and classification method
CN117216617A (en) Text classification model training method, device, computer equipment and storage medium
CN113722477B (en) Internet citizen emotion recognition method and system based on multitask learning and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant