CN113946670B - Contrast type context understanding enhancement method for dialogue emotion recognition - Google Patents

Contrast type context understanding enhancement method for dialogue emotion recognition Download PDF

Info

Publication number
CN113946670B
CN113946670B CN202111217510.0A CN202111217510A CN113946670B CN 113946670 B CN113946670 B CN 113946670B CN 202111217510 A CN202111217510 A CN 202111217510A CN 113946670 B CN113946670 B CN 113946670B
Authority
CN
China
Prior art keywords
emotion
representation
context
model
session
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111217510.0A
Other languages
Chinese (zh)
Other versions
CN113946670A (en
Inventor
宋大为
张寒青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202111217510.0A priority Critical patent/CN113946670B/en
Publication of CN113946670A publication Critical patent/CN113946670A/en
Application granted granted Critical
Publication of CN113946670B publication Critical patent/CN113946670B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a contrast type context understanding enhancement method for dialogue emotion recognition, and belongs to the technical field of computers and information science. First, based on the existing dialog emotion analysis framework, a hidden state sequence for emotion classification is extracted. Then, based on the extracted sequence representation, a comparison sample is constructed that contains the contextual semantic perception patterns. The model is then enabled to learn the patterns it contains from the sample using a contrast learning loss function to enhance the model's understanding of the dialog context. And finally, adding the comparison loss and the emotion classification loss function, and performing multi-task learning to complete the training of the network model. The method has strong suitability, can be flexibly embedded into the existing emotion classification model, can judge emotion from the viewpoint of understanding the dialogue context content to a certain extent, and can effectively improve the emotion classification accuracy and the robustness of disturbance of the existing model.

Description

Contrast type context understanding enhancement method for dialogue emotion recognition
Technical Field
The invention relates to a contrast type context understanding enhancement method for dialogue emotion recognition, and belongs to the technical field of computers and information science.
Background
The research goal of dialog emotion recognition (CER) is to distinguish emotion of each session in a session. Effective dialog emotion recognition is critical to the construction of dialog systems. If the dialog system is able to take into account the emotional state of the user, it will be made to exhibit a human-like concentricity, which is of great value for improving the user-friendliness of the human-machine interaction of the dialog system. Accordingly, research into emotion recognition of a conversation has attracted more and more attention in recent years.
With the progress of deep learning technology, the emotion recognition method based on the neural network has a certain breakthrough in performance. Currently, existing approaches mostly strive to build a more efficient speech characterization to better model dialog context. Specifically, the utterances in the dialogue are regarded as a sequence, and various emotion influencing factors (such as inter-speaker influence, intra-speaker influence, topics, individuality and the like) on each target utterance are aggregated by using a sequence model commonly used in natural language processing, such as a cyclic neural network (RNN), a Transformer (Transformer), a graphic neural network (GCN) and the like, so as to obtain a final utterance-level emotion recognition vector representation, and finally emotion classification is performed. However, the process of the dialogue is affected by a plurality of factors such as theme, intention, view, and demonstration logic of the dialogue, so that it is still difficult for these methods to effectively judge the emotion of the current dialogue through understanding the context information, and thus the classification accuracy and robustness of the existing models are limited to a certain extent.
In summary, the application provides a contrast type context understanding enhancement method for dialogue emotion recognition. By introducing contrast learning, the existing dialogue emotion classification model is forced to pay attention to context information while emotion discrimination is completed, understanding of the dialogue emotion classification model in the dialogue context is enhanced, and accuracy and robustness of emotion classification of the model are improved.
Disclosure of Invention
The invention aims to provide a contrast type context understanding enhancement method for dialogue emotion recognition, aiming at the technical problems of low classification accuracy and poor model robustness caused by insufficient understanding of dialogue contexts in the existing neural network-based dialogue emotion recognition method. By introducing contrast learning, the existing dialogue emotion classification model is forced to pay attention to context information while emotion discrimination is completed, so that understanding of the dialogue emotion classification model in the dialogue context is enhanced, and accuracy and robustness of model emotion classification are improved.
The innovation point of the invention is that: first, based on the existing dialog emotion analysis framework, a hidden state sequence for emotion classification is extracted. Then, based on the extracted sequence representation, a comparison sample is constructed that contains the contextual semantic perception patterns. The model is then enabled to learn the patterns it contains from the sample using a contrast learning loss function to enhance the model's understanding of the dialog context. And finally, adding the comparison loss and the emotion classification loss function, and performing multi-task learning to complete the training of the network model.
The technical scheme of the invention is realized by the following steps.
A contrast type context understanding enhancement method for dialogue emotion recognition comprises the following steps:
step1: extracting emotion expression sequences in the existing emotion classification frames.
Specifically, the following method may be employed:
step 1.1: and vectorizing the conversation text of the conversation to obtain a corresponding distributed text representation.
Step 1.2: and (3) sending the text representation in the step 1.1 into an existing dialogue emotion classification model to obtain an emotion representation sequence before the model is fully connected with a classification layer.
Step 2: a comparison sample pair is constructed that includes context-aware characteristics.
Specifically, the following method may be employed:
step 2.1: and encoding the historical information of each target session emotion representation to be classified to obtain the abstract representation of the context.
Step 2.2: the target session itself and the session adjacent in the same direction are used as positive examples of the target session context representation, and the session emotion representations sampled from other irrelevant dialogs are used as negative examples, so that the construction of the comparison sample pair is completed.
Step 2.3: steps 2.1 to 2.2 are repeated to construct a corresponding pair of comparison samples for each target session in the opposite direction of the dialog flow.
Step 3: and constructing a contrast loss function, and carrying out joint training with the original emotion classification frame.
Specifically, the following method may be employed:
step 3.1: constructing a comparison loss function, and making the distance between negative examples in the comparison sample constructed in the step2 become far in the constructed implicit semantic space and making the distance between positive examples closer.
Step 3.2: and adding the contrast loss function and the loss function in the original dialogue emotion classification frame, and jointly training with the original network to obtain a new dialogue emotion classification model.
And judging emotion in the target dialogue by using the obtained dialogue emotion classification model, and classifying the dialogue emotion.
Advantageous effects
Compared with the prior art, the method has the following advantages:
The method has strong suitability, can be flexibly embedded into the existing emotion classification model, can judge emotion from the viewpoint of understanding the dialogue context content to a certain extent, and can effectively improve the emotion classification accuracy and the robustness of disturbance of the existing model.
Drawings
FIG. 1 is a schematic diagram of the method of the present invention.
Fig. 2 is a process diagram of constructing a comparative sample pair.
Detailed Description
For a better illustration of the objects and advantages of the invention, a more detailed description of the specific embodiments of the method of the invention will be given below with reference to examples.
A contrast type context understanding enhancement method for dialogue emotion recognition comprises the following steps:
step1: extracting emotion expression sequences in the existing emotion classification frames.
Further, step1 includes the following steps.
Step 1.1: for a segment of text dialog, the text sequence content of the dialog is represented by word2vector as a text sequence in vector form: Where u represents the conversation text and T l represents the number of conversation turns in the first conversation segment in the training data. /(I) Is text information.
Step 1.2: as shown in fig. 1, a dialogue emotion classification model CER is provided, the vector text sequence obtained in the step 1 is represented as a fed emotion classification model, and the emotion sequence representation before the full connection layer of the model for emotion classification is obtained:
wherein H is emotion sequence representation, H represents emotion vector representation of conversation at different moments, and CER is an existing dialogue emotion classification model. Is an emotion vector representation.
Step 2: as shown in fig. 2, a comparison sample pair including context-aware characteristics is constructed based on the affective sequence representation H obtained in step 1.
Further, step2 includes the following steps.
Step 2.1: using a sequence modelFor each target session emotion representation h k to be classified, the historical session information is encoded to obtain the abstract representation/>, of the context
Wherein h k-1 is the emotion vector of the session at time k-1.
Step 2.2: taking the representations of w sessions which are in the same direction as the target session as positive examples of the target session context representation, and forming a positive example pair set:
Wherein, Representing a set of forward sample pairs, p k represents a single forward sample pair, h k+w is the emotion vector for a session at time k+w, where k+w < T l, k represents the location of the target session in the conversation.
Similarly, a sequence model in the opposite direction is utilizedEncoding future sequence information of emotion representation of target session to obtain abstract context representation/>Then construct a positive example pair of its opposite directions:
Wherein, Representing a set of reverse pairs of samples, h k-w is the emotion representation vector for the session at time k-w.
To sum up, a set P k of all positive sample pairs of the target session is obtained:
Step 2.3: combining the two-directional context representations obtained in step 2.2 And/>Taking conversation emotion representations sampled from other irrelevant conversation data as negative examples, constructing negative sample pairs:
Wherein, And/>Representing a session emotion representation randomly sampled from other sessions. /(I)Respectively represent the negative sample pair sets in the positive and negative directions, and n k is a single comparison sample pair therein.
The negative pair set N k for the target session h k is represented as:
P k and N k contain patterns that enable models to perceive dialog context. For each emotional representation of the words, the corresponding comparison sample pairs are obtained through the same process. Combining them with the contrast loss allows the model to learn the features contained in these contrast samples.
Step 3: and (3) constructing a contrast loss function by combining the contrast sample pair obtained in the step (2), and carrying out joint training with the emotion classification frame.
Further, step 3 includes the following steps.
Step 3.1: and constructing a contrast loss function, so that the distance between negative examples in the contrast sample becomes far and the distance between positive examples becomes near.
For a target utterance h k, whose corresponding pair of comparison samples is D k={Pk,Nk, there is provided any one of the pairsFirst splice them, then calculate the matching score between samples o j by a fully connected perceptron (MLP):
Wherein, And/>Respectively representing the context and the session vector representation in any one comparison sample; mlp is a fully connected perceptron network.
Then, the matching score o j is normalized to between [ -1, +1] by a sigmoid function:
sj=sigmoid(oj) (9)
based on the matching network constructed for each sample pair, a contrast loss is constructed, so that the matching score between the positive sample pair is increased, and the matching score between the negative sample pair is decreased:
Wherein, And/>Respectively, represent the matching score values between the corresponding positive and negative pairs of samples, |p k | represents the number of positive samples, and |n k | represents the number of corresponding negative pairs of samples. L c represents the contrast penalty for each target session. D k is a comparative sample pair.
Step 3.2: the loss function L (theta) of the whole network is obtained by adding the original emotion classification loss function and the comparison loss function, and the specific form is as follows:
Where θ is all parameters of the entire network. T l represents the number of dialog turns contained in the first dialog in the training database; l e(ut) represents a penalty function for emotion classification of the target session u t; l c(Dt) is a contrast loss function. Lambda represents the intensity parameter of the contrast loss used to control the intensity of the context enhancement task. L (θ) represents the loss function of the entire network.
The whole network realizes the effect of enhancing the context of the existing dialogue emotion classification model by carrying out joint training on the two tasks.
Experiment verification
3 Representative dialogue emotion models were chosen as baseline models and experiments were performed on MELD and IEMOCAP data, respectively. The result shows that after the context understanding enhancement method provided by the application is added, the classification accuracy rate can be improved by 2-3% on the baseline model; and when based on the disturbance test of the context replacement of each session content, the method can show stronger robustness, and can still maintain higher classification accuracy.

Claims (2)

1. A contrast type context understanding enhancement method for dialogue emotion recognition is characterized by comprising the following steps:
step 1: extracting hidden state sequences for emotion classification based on the existing dialogue emotion analysis frame;
Step 1.1: vectorizing a conversation text of a conversation to obtain a corresponding distributed text representation;
the text sequence content of the dialog is represented as a text sequence in vector form: where u represents the text of the conversation, T l represents the number of conversation turns in the first conversation in the training data,/> Is text information;
Step 1.2: sending the text representation into the existing dialogue emotion classification model to obtain emotion representation sequences before the model is fully connected with a classification layer;
The method comprises the steps that a dialogue emotion classification model CER is provided, an obtained vector type text sequence is expressed as a fed emotion classification model, and an emotion sequence before a full connection layer for emotion classification of the model is obtained is expressed as follows:
wherein H is emotion sequence representation, H is emotion vector representation of conversation at different moments, CER is an existing dialogue emotion classification model, Is represented by emotion vectors;
step 2: constructing a contrast sample pair containing a context semantic perception mode based on the extracted sequence representation;
Step 2.1: encoding historical information of each target session emotion representation to be classified to obtain abstract representation of the context of the historical information;
using a sequence model For each target session emotion representation h k to be classified, the historical session information is encoded to obtain the abstract representation/>, of the context
H k-1 is the emotion vector of the session at the moment k-1;
Step 2.2: taking the target session and the session representations adjacent in the same direction as positive examples of the target session context representation, taking the session emotion representations sampled from other irrelevant dialogues as negative examples, and further completing the construction of a comparison sample pair;
Taking the representations of w sessions which are in the same direction as the target session as positive examples of the target session context representation, and forming a positive example pair set:
Wherein, Representing a set of forward sample pairs, p k representing a single forward sample pair, h k+w being the emotion vector of the session at time k+w, where k+w < T l, k represents the location of the target session in the conversation;
using a sequence model in the opposite direction Encoding future sequence information of emotion representation of target session to obtain abstract context representation/>Then construct a positive example pair of its opposite directions:
Wherein, H k-w is the emotion expression vector of the session at the moment k-w;
To sum up, a set P k of all positive sample pairs of the target session is obtained:
step 2.3: repeating steps 2.1 to 2.2, and constructing corresponding comparison sample pairs for each target session in the opposite direction of the dialogue flow;
Combining forward and reverse directional context representations And/>Taking conversation emotion representations sampled from other irrelevant conversation data as negative examples, constructing negative sample pairs:
Wherein, And/>Representing a session emotion representation randomly sampled from other sessions; /(I)Respectively representing negative sample pair sets in the positive direction and the negative direction, wherein n k is a single comparison sample pair;
the negative pair set N k for the target session h k is represented as:
P k and N k contain patterns that enable models to perceive dialog context; for emotion characterization of each utterance, corresponding comparison sample pairs are obtained through the same process; combining them with contrast loss, allowing the model to learn the features contained in these contrast samples;
Step 3: constructing a contrast learning loss function, and enabling the model to learn the modes contained in the model from the sample; adding the comparison loss and the emotion classification loss function, and performing multitask learning to complete network model training;
and judging emotion in the target dialogue by using the dialogue emotion classification model, and realizing classification of dialogue emotion.
2. The method for enhancing the comparative context understanding of dialog emotion recognition as claimed in claim 1, wherein the implementation method of step 3 is as follows:
Step 3.1: constructing a contrast loss function, and enabling the distance between negative examples in the contrast sample to be far and the distance between positive examples to be close;
For a target utterance h k, whose corresponding pair of comparison samples is D k={Pk,Nk, there is provided any one of the pairs First splice them, then calculate the matching score o j between samples by a fully connected perceptron MLP:
Wherein, And/>Respectively representing the context and the session vector representation in any one comparison sample; mlp is a fully connected perceptron network;
Then, the matching score o j is normalized to between [ -1, +1] by a sigmoid function:
sj=sigmoid(oj) (9)
based on the matching network constructed for each sample pair, a contrast penalty is constructed, with the matching score between positive sample pairs being increased, and the matching score between negative sample pairs being decreased:
Wherein, And/>Respectively representing the matching score value between the corresponding positive and negative sample pairs, |p k | represents the number of positive samples, and |n k | represents the number of corresponding negative sample pairs; l c represents the contrast penalty for each target session; d k is a comparative sample pair;
Step 3.2: the loss function L (theta) of the whole network is obtained by adding the original emotion classification loss function and the comparison loss function, and the specific form is as follows:
Wherein θ is all parameters of the entire network; t l represents the number of dialog turns contained in the first dialog in the training database; l e(ut) represents a penalty function for emotion classification of the target session u t; l c(Dt) is a contrast loss function; λ represents a contrast loss intensity parameter used to control the intensity of the context enhancement task; l (θ) represents a loss function of the entire network;
The whole network realizes the effect of enhancing the context of the existing dialogue emotion classification model by carrying out joint training on the two tasks.
CN202111217510.0A 2021-10-19 2021-10-19 Contrast type context understanding enhancement method for dialogue emotion recognition Active CN113946670B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111217510.0A CN113946670B (en) 2021-10-19 2021-10-19 Contrast type context understanding enhancement method for dialogue emotion recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111217510.0A CN113946670B (en) 2021-10-19 2021-10-19 Contrast type context understanding enhancement method for dialogue emotion recognition

Publications (2)

Publication Number Publication Date
CN113946670A CN113946670A (en) 2022-01-18
CN113946670B true CN113946670B (en) 2024-05-10

Family

ID=79331406

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111217510.0A Active CN113946670B (en) 2021-10-19 2021-10-19 Contrast type context understanding enhancement method for dialogue emotion recognition

Country Status (1)

Country Link
CN (1) CN113946670B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114756678B (en) * 2022-03-25 2024-05-14 鼎富智能科技有限公司 Unknown intention text recognition method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108874972A (en) * 2018-06-08 2018-11-23 青岛里奥机器人技术有限公司 A kind of more wheel emotion dialogue methods based on deep learning
KR20200119410A (en) * 2019-03-28 2020-10-20 한국과학기술원 System and Method for Recognizing Emotions from Korean Dialogues based on Global and Local Contextual Information
CN112949684A (en) * 2021-01-28 2021-06-11 天津大学 Multimodal dialogue emotion information detection method based on reinforcement learning framework
CN113065344A (en) * 2021-03-24 2021-07-02 大连理工大学 Cross-corpus emotion recognition method based on transfer learning and attention mechanism
CN113254625A (en) * 2021-07-15 2021-08-13 国网电子商务有限公司 Emotion dialogue generation method and system based on interactive fusion

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10818312B2 (en) * 2018-12-19 2020-10-27 Disney Enterprises, Inc. Affect-driven dialog generation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108874972A (en) * 2018-06-08 2018-11-23 青岛里奥机器人技术有限公司 A kind of more wheel emotion dialogue methods based on deep learning
KR20200119410A (en) * 2019-03-28 2020-10-20 한국과학기술원 System and Method for Recognizing Emotions from Korean Dialogues based on Global and Local Contextual Information
CN112949684A (en) * 2021-01-28 2021-06-11 天津大学 Multimodal dialogue emotion information detection method based on reinforcement learning framework
CN113065344A (en) * 2021-03-24 2021-07-02 大连理工大学 Cross-corpus emotion recognition method based on transfer learning and attention mechanism
CN113254625A (en) * 2021-07-15 2021-08-13 国网电子商务有限公司 Emotion dialogue generation method and system based on interactive fusion

Also Published As

Publication number Publication date
CN113946670A (en) 2022-01-18

Similar Documents

Publication Publication Date Title
Tan et al. The artificial intelligence renaissance: deep learning and the road to human-level machine intelligence
CN108710704B (en) Method and device for determining conversation state, electronic equipment and storage medium
CN113516968B (en) End-to-end long-term speech recognition method
CN113987179B (en) Dialogue emotion recognition network model based on knowledge enhancement and backtracking loss, construction method, electronic equipment and storage medium
CN112037773B (en) N-optimal spoken language semantic recognition method and device and electronic equipment
CN115964467A (en) Visual situation fused rich semantic dialogue generation method
Chen et al. Distilled binary neural network for monaural speech separation
CN113065344A (en) Cross-corpus emotion recognition method based on transfer learning and attention mechanism
CN114091478A (en) Dialog emotion recognition method based on supervised contrast learning and reply generation assistance
CN111899766B (en) Speech emotion recognition method based on optimization fusion of depth features and acoustic features
CN112905772A (en) Semantic correlation analysis method and device and related products
CN111653270B (en) Voice processing method and device, computer readable storage medium and electronic equipment
CN113946670B (en) Contrast type context understanding enhancement method for dialogue emotion recognition
CN110069611A (en) A kind of the chat robots reply generation method and device of theme enhancing
CN116304973A (en) Classroom teaching emotion recognition method and system based on multi-mode fusion
CN114898779A (en) Multi-mode fused speech emotion recognition method and system
Zhao et al. Knowledge-aware bayesian co-attention for multimodal emotion recognition
CN115249479A (en) BRNN-based power grid dispatching complex speech recognition method, system and terminal
CN113656569A (en) Generating type dialogue method based on context information reasoning
CN113177113A (en) Task type dialogue model pre-training method, device, equipment and storage medium
CN111414466A (en) Multi-round dialogue modeling method based on depth model fusion
CN111160512A (en) Method for constructing dual-discriminator dialog generation model based on generative confrontation network
CN115795010A (en) External knowledge assisted multi-factor hierarchical modeling common-situation dialogue generation method
CN117980915A (en) Contrast learning and masking modeling for end-to-end self-supervised pre-training
CN115169363A (en) Knowledge-fused incremental coding dialogue emotion recognition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant