CN115033695A

CN115033695A - Long-dialog emotion detection method and system based on common sense knowledge graph

Info

Publication number: CN115033695A
Application number: CN202210676215.XA
Authority: CN
Inventors: 聂为之; 鲍玉茹; 刘安安
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2022-06-15
Filing date: 2022-06-15
Publication date: 2022-09-09

Abstract

The invention relates to the technical field of emotion analysis, and discloses a long-conversation emotion detection method and system of a common knowledge map, which are used for detecting the emotion of a long conversation through the speech characteristics at the time t

Construction of graph model g _t (V _t ,E _t ) The words are combined

Is added as a new node to the graph model g _t (V _t ,E _t ) Get the updated graph model g _t ^* (V _t ,E _t ) (ii) a Updating graph model g with GNN model _t ^* (V _t ,E _t ) Obtaining updated speech features according to the speech features; speech element u based on time t _i The first m utterances, determining the final t-time utterance subject f _t Based on the updated speech feature and the final t-time speech subject f _t And determining the long conversation emotion label. The invention adds the prior knowledge as a new node to the graph model g _t (V _t ,E _t ) In the method, the function of conversation emotion recognition and the relation between speakers are described, the influence of self emotion and mutual emotion is integrated, and the emotion of the conversation is recognized quickly and effectively.

Description

Long-dialog emotion detection method and system based on common sense knowledge graph

Technical Field

The invention relates to the technical field of emotion analysis, in particular to a long-conversation emotion detection method and system based on a common knowledge graph.

Background

In recent years, with the rapid development of computer technology and social networks, conversation robots have gradually replaced traditional human workers in many application fields, such as healthcare, early education, and court programs. In order to ensure the logic of conversation robot communication, not only the setting of a conversation needs to be evaluated, but also the emotion of a human in a conversation phase needs to be evaluated, so the research of conversation emotion detection is more and more important.

For a session, how to capture the human emotion in its rapidly changing context is a prominent problem in the detection of emotion in the session. Classic LSTM, RNN and other methods can only extract a part of context information, and cannot realize quick and effective identification. Li et al propose a bi-directional loop unit for emotion analysis of a conversation, which includes a generalized neural tensor block for obtaining context information and a two-channel classifier for emotion classification, however, a conversation often includes multiple speakers, which are affected by both self-emotion and emotion of other speakers, and the above model considers more time influence, ignores relationship information between speakers, and does not integrate self-emotion and mutual emotion influences. Hazarika et al proposed a deep neural network called Conversational Memory Network (CMN) and extensively discussed these two factors, self-emotion and mutual emotion, modeling each speaker's past utterances as memories, then merging these memories using attention-based jumps and propagating context and sequence information using long-short term memory networks (LSTM) and gated round robin units (GRU), however, the extracted context information is limited and the actual performance is poor.

In the prior art of dialogue emotion detection, extracted context information is limited, the emotion of a dialogue cannot be quickly and effectively identified, the effect of prior knowledge on dialogue emotion identification and relationship information between speakers are generally ignored, and the influence of self emotion and mutual emotion is not comprehensively generated.

Disclosure of Invention

The invention aims to provide a long-conversation emotion detection method and system based on a common-sense knowledge graph, which can quickly identify the emotion of a conversation by combining prior knowledge and can synthesize the influence of self emotion and mutual emotion.

In order to achieve the purpose, the invention provides the following scheme:

a long-conversation emotion detection method of a common sense knowledge graph, comprising:

speech features based on time t

Construction of graph model g _t (V _t ,E _t ) (ii) a Wherein, V _t Is a collection of nodes that are to be grouped together,

E _t for different nodes v _i And v _j The set of undirected edges between the two nodes, wherein i belongs to 1 … n, and j belongs to 1 … n; the graph model g _t (V _t ,E _t ) Dialogue data for representing time t;

utterances based on time t

Extracting utterances

A priori knowledge of;

combining the utterance

Is added as a new node to the graph model g _t (V _t ,E _t ) To obtain an updated graph model g _t ^* (V _t ,E _t )；

Graph model g is updated by GNN model _t ^* (V _t ,E _t ) Obtaining updated speech features according to the speech features;

speech element u based on time t _i The first m utterances, determining the final t-time utterance subject f _t Wherein m is 1 … i-1, i is 1 … n;

based on the updated speech characteristics and the final t-time speech subject f _t Determining a long conversation emotion label; the long dialog emotion tag comprises: happiness, depression, surprise, emotional colorfulness, fear, disgust and anger.

Optionally, based on the utterance at time t

Extracting utterances

The prior knowledge of (a) is specifically:

obtaining utterance elements u _i The search result of (2); the search result is the first k results in the triple format, and is represented as:

based on the speech element u _i Retrieval result of (A) obtaining an utterance

The search result set of (2) is expressed as:

merging each retrieval result in the set based on the retrieval result set to obtain an utterance

A priori knowledge of.

Optionally, the speech element u based on the t time _i The first m speech elements determine the final t speech theme f _t The method specifically comprises the following steps:

obtaining the speech element u at the time t _i The top m utterance elements;

the speech element u at the time t is divided into _i The previous m speech elements are converted into m speech features;

3 XN different speech elements u at time t _i Inputting the first m speech elements into the self-attention mechanism layer to obtain N sample speech subjects f _t N positive sample utterance topics f _p And N negative sample utterance topics f _n (ii) a Wherein, every time the theme is extracted, the time interval is more than or equal to T _interval The N sample utterance topics f _t N positive sample utterance topics f _p And N negative sample utterance topics f _n Constituting a triplet (f) _t ,f _p ,f _n )；

Based on the triplet (f) _t ,f _p ,f _n ) Constructing a loss function; the loss function is specifically:

wherein sim (f) _t ,f _p ) As a subject f _t And f _p Dot product between them, i.e. cosine similarity, sim (f) _t ,f _n ) As a subject f _t And f _n Dot product between, i.e. cosine similarity, f _t Is a sample utterance topic, f _p Is f _t Positive sample utterance topic of f _n Is f _t N is the number of training samples, N belongs to 1 … N, and α is a model parameter;

based on the lossLost function, determining final t-time speaking subject f _t 。

Optionally, based on the updated speech feature and the final t-time speech topic f _t Determining the long conversation emotion label specifically as follows:

the updated speech characteristics and the final t-time speech subject f _t Inputting the speech into a transform decoder structure to obtain the final speech characteristics

l is the maximum number of layers in the transform decoder structure;

the final speech characteristics are combined

Inputting the softmax function to obtain the long dialogue emotion label.

Optionally, one layer of the transform decoder structure comprises:

a self-attention module for fusing and updating speech features

Obtaining a first speech feature

Wherein k is 1 … l;

a cross attention module connected with the self attention module for fusing and updating the first speech feature

Obtaining second speech characteristics

The cross attention module is also for fusing and updating the second spoken language feature

Deriving a third speech feature

Wherein k is 1 … l;

a feed-forward neural network module connected with the cross attention module for updating the third speech feature

Obtaining an updated fourth utterance feature

Optionally, the self-attention module applies the following formula to the speech features

And (3) performing fusion and updating:

wherein,

the weight parameter of the attention Key index that needs to be learned for the k-th self-attention module,

is the first weight parameter of the Attention Value that the k-th self-Attention module needs to learn, MultiHead is the multi-headed self-Attention function, Attention is the self-Attention function, softmax is the classification function,

is composed of

The variance of (c).

Optionally, the cross-attention module applies the following formula specifically to the first utterance feature

And (3) performing fusion and updating:

wherein,

is a second weight parameter of Attention Value that needs to be learned by the k-th layer cross Attention module, MultiHead is a multi-head self-Attention function, Attention is a self-Attention function, softmax is a classification function,

is composed of

The variance of (a);

the cross attention module is specifically configured to apply the following formula to the second speech feature

And (3) performing fusion and updating:

wherein,

is the third weight parameter of Attention Value that the k-th cross Attention module needs to learn, MultiHead is a multi-headed self-Attention function, Attention is a self-Attention function, softmax is a classification function,

is composed of

The variance of (c).

Optionally, the feed-forward neural network module is to the third utterance feature by the following formula

Updating:

wherein,

and (4) constructing the updated fourth speech characteristic for the k-th layer transform decoder.

Based on the method, the invention also provides a long-conversation emotion detection system of the common knowledge base, and the long-conversation emotion detection system comprises:

a graph model construction module for constructing the graph model according to the characteristics of the utterances at the time t

Construction of graph model g _t (V _t ,E _t ) (ii) a Wherein, V _t In the form of a set of nodes, the nodes,

E _t for different nodes v _i And v _j The set of undirected edges, wherein i belongs to 1 … n, and j belongs to 1 … n; the graph model g _t (V _t ,E _t ) Dialogue data for representing time t;

a priori knowledge acquisition module for utterances based on t-time

Extracting utterances

A priori knowledge of;

a graph model update module connected with the priori knowledge acquisition module for updating the utterance

Is added as a new node to the graph model g _t (V _t ,E _t ) Get the updated graph model g _t ^* (V _t ,E _t )；

A speech feature updating module connected with the graph model updating module and used for updating the graph model g by adopting the GNN model _t ^* (V _t ,E _t ) Obtaining updated utterance features;

a speaking subject determining module for determining the speaking element u based on the t time _i The first m utterances, determining the final t-time utterance subject f _t Wherein m is 1 … i-1, i is 1 … n;

a long dialog emotion label determination module connected with the speaking feature updating module and the speaking theme determination module and used for determining the final t moment speaking theme f based on the updated speaking feature _t Determining a long conversation emotion label; the long dialog emotion tag comprises: happiness, depression, surprise, emotional colorfulness, fear, disgust and anger.

Optionally, the a priori knowledge obtaining module specifically includes:

speech element u _i For obtaining the utterance element u _i The search result is the first k results in the triple format, and is represented as:

words and phrases

And the utterance element u _i Is connected to the search result acquisition unit for obtaining a search result based on the utterance element u _i Retrieval result of (A) obtaining an utterance

The search result set of (2) is expressed as:

a priori knowledge acquisition unit, and the utterance

Is connected to the retrieval result set obtaining unit and is used for taking and collecting each retrieval result in the set based on the retrieval result set to obtain the words

A priori knowledge of.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the invention provides a long-dialog emotion detection method and system of a common knowledge map, which passes the speech characteristics at the time t

Construction of graph model g _t (V _t ,E _t ) The words are combined

As new nodes are added to the graph model g _t (V _t ,E _t ) Get the updated graph model g _t ^* (V _t ,E _t ) (ii) a Updating graph model g with GNN model _t ^* (V _t ,E _t ) Obtaining updated utterance features; speech element u based on time t _i The first m utterances, determining the final t-time utterance subject f _t Based on the updated speech feature and the final t-time speech subject f _t Determining a long conversation emotion label; the long dialog emotion tag comprises: happiness, depression, surprise, emotional colorfulness, fear, disgust and anger. The invention adds the prior knowledge as a new node to the graph model g _t (V _t ,E _t ) In the method, the function of dialogue emotion recognition and the relation between speakers are described, the influence of self emotion and mutual emotion is integrated, and the emotion of the dialogue is recognized quickly and effectively.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flowchart of a method for detecting emotion in long dialog based on common sense knowledge-graph according to an embodiment of the present invention;

FIG. 2 is a flow chart of updated utterance features in accordance with an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for determining a final time t-speaking topic according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a transform decoder structure layer according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a long-dialog emotion detection system based on common sense knowledge base.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention aims to provide a long dialogue emotion detection method and system of a common knowledge map, which can synthesize the influence of self emotion and mutual emotion and quickly and effectively identify the emotion of dialogue.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Referring to fig. 1 of the drawings, the invention provides a long-dialog emotion detection method for a common knowledge graph, which comprises the following steps:

s1, speech feature based on t time

E _t for different nodes v _i And v _j The set of undirected edges between the two nodes, wherein i belongs to 1 … n, and j belongs to 1 … n; the graph model g _t (V _t ,E _t ) For representing the dialogue data at time t.

In particular, E _t Two types of edges are included: one is a temporal edge, representing the utterance v _i And v _j In the sequence of occurrence in the conversation, when speaker A speaks the speech feature v _i Speaker B then speaks utterance feature v _j I.e. node v _i And v _j There is a chronological order between them, if v _i At v is _j Before, then the edge exists and the weight is defined as 1, otherwise, changeBecomes absent and has a weight of 0. The other is semantic edge, which measures utterances v by cosine distance _i And v _j The similarity between them.

S2 utterance based on t time

Extracting utterances

The prior knowledge comprises the following specific steps:

s21, obtaining the speech element u _i The search result of (2); the search result is the first k results in the triple format, and is represented as:

wherein each similar utterance in the top k results consists of triplets, the relationships in these triplets including intent, reaction, and the like.

Specifically, based on the common sense knowledge corpus ATOMIC and the common sense knowledge base ConceptNet, the common sense knowledge retrieval method is applied to search the prior knowledge of the utterance.

S22, based on the speech element u _i Retrieval result of (A) obtaining an utterance

The search result set of (2) is expressed as:

s23, based on the retrieval result set, merging each retrieval result in the set to obtain an utterance

A priori knowledge of.

Specifically, these prior knowledge are extracted to feature vectors through BilSTM and added to the graph as new nodes as auxiliary information to help the utterance embedding learning.

In particular, utterances at time t

Converting the speech characteristics of t time into speech characteristics of t time through a Bi-directional Long Short-Term Memory network model

S3, transforming the words

Is added as a new node to the graph model g _t (V _t ,E _t ) Get the updated graph model g _t ^* (V _t ,E _t )。

S4, updating graph model g by GNN model _t ^* (V _t ,E _t ) The updated speech feature is obtained.

In particular, referring to FIG. 2 of the drawings, the words at time t are labeled

Converting into the speech characteristics of t time through a BilSTM layer bidirectional long-term short-term memory network model

Construction of graph model g _t (V _t ,E _t ) And retrieving prior knowledge based on the common sense knowledge corpus ATOMIC and the common sense knowledge base conceptNet, wherein the prior knowledge is used for updating the graph model, and the node features in the graph are updated through the classical GNN model to obtain updated speech features.

S5, speech element u based on t time _i The first m utterances, determining the final t-time utterance subject f _t Wherein m is 1 … i-1, i is 1 … n, and the concrete steps are as follows:

s51, obtaining the speech element u at the time t _i The top m speech elements.

S52, converting the speech element u at the t moment _i The first m wordsThe speech elements are converted into m speech features.

S53, converting 3 XN different speech elements u at t time _i Inputting the first m speech elements into the self-attention mechanism layer to obtain N sample speech subjects f _t N positive sample utterance topics f _p And N negative sample utterance topics f _n (ii) a Wherein, every time the theme is extracted, the time interval is more than or equal to T _interval The N sample utterance topics f _t N positive sample utterance topics f _p And N negative sample utterance topics f _n Constituting a triplet (f) _t ,f _p ,f _n )。

S54, based on the triplet (f) _t ,f _p ,f _n ) Constructing a loss function; the loss function is specifically:

wherein sim (f) _t ,f _p ) As a subject f _t And f _p Dot product between them, i.e. cosine similarity, sim (f) _t ,f _n ) As a subject f _t And f _n Dot product between, i.e. cosine similarity, f _t Is a sample utterance topic, f _p Is f _t Positive sample utterance topic of f _n Is f _t Is the number of training samples, N is e 1 … N, and α is the model parameter.

In particular, the objective of the loss function is to reduce two identical sample utterance topics f _t And positive sample utterance subject f _p And adding two different sample utterance topics f _t And negative sample utterance topic f _n The difference between them.

In particular, for said triplet (f) _t ,f _p ,f _n ) Performing contrast learning, wherein the task of the contrast learning is to enable positive samples and negative samples to have the same or similar feature vectors and dissimilar representations, (f) _t ,f _p ,f _n ) Through comparative learning, a changing utterance subject is obtained.

S55, determining the final t-time speaking topic f based on the loss function _t 。

Specifically, referring to FIG. 3 of the drawings, m utterances of a current conversation are input into a BilSTM layer bidirectional long-term short-term memory network model to obtain m utterance features, and the m utterance features are input into a self-attention mechanism to obtain N sample utterance topics f _t N positive sample utterance topics f _p And N negative sample utterance topics f _n Triplet (f) _t ,f _p ,f _n ) And performing comparison learning to obtain the current theme.

Specifically, conversation topic extraction mainly focuses on conversation background topics in different periods and guides speech emotion recognition.

S6, based on the updated speaking characteristics and the final t-time speaking subject f _t Determining a long conversation emotion label; the long dialog emotion tag comprises: happiness, depression, surprise, emotional colorfulness, fear, disgust and anger.

In particular, referring to FIG. 4 of the drawings, based on the updated utterance feature and the final t-time utterance subject f _t Determining the long conversation emotion label specifically as follows:

l is the maximum number of layers in the transform decoder structure;

the final speech characteristics are processed

Inputting the softmax function to obtain the long dialogue emotion label.

Specifically, the loss function of the softmax function is:

wherein L is _final For the cross-entropy loss value when calculating the softmax function,

representing speech features

The k-th layer transform decoder structure updates the speech characteristics, M is the number of emotion labels,

being features of speech

The k-th layer transform decoder structure updates the speech characteristics, k being 1 … … l.

Specifically, the transformer decoder structure has l layers in common, wherein one layer in the transformer decoder structure comprises: a self-attention module, a cross-attention module and a feedforward neural network module;

the self-attention module is used for fusing and updating speech features

Obtaining a first speech feature

Wherein k is 1 … l;

in particular, the self-attention module applies the following formula specifically to the utterance feature

And (3) performing fusion and updating:

wherein,

for the weight parameter of the attention Key index that the k-th self-attention module needs to learn,

is composed of

The variance of (c).

The cross attention module is connected with the self attention module, and the cross attention module is used for fusing and updating the first speech feature

Obtaining second speech characteristics

Deriving a third speech feature

Wherein k is 1 … l;

specifically, the cross-attention module specifically employs the following formula for the firstA speech feature

And (3) performing fusion and updating:

wherein,

is composed of

The variance of (a);

And (3) performing fusion and updating:

wherein,

is the k-th layer cross attention module learningThe third weight parameter of the learned Attention Value, MultiHead, Attention, softmax, classification function,

is composed of

The variance of (c).

The feed-forward neural network module is connected with the cross attention module and used for updating the third speech feature

Obtaining an updated fourth utterance feature

Specifically, the feed-forward neural network module applies the following formula to the third speech feature

Updating:

wherein,

Referring to fig. 5 of the drawings, based on the method, the invention further provides a long-dialog emotion detection system of the common sense knowledge base, and the long-dialog emotion detection system comprises: the system comprises a graph model construction module 1, a priori knowledge acquisition module 2, a graph model updating module 3, a speaking characteristic updating module 4, a speaking topic determining module 5 and a long conversation emotion label determining module 6.

The graph model construction module 1 is used for constructing the speech characteristics according to t time

the priori knowledge acquisition module 2 is used for utterances based on t time

Extracting utterances

A priori knowledge of;

the graph model updating module 3 is connected with the prior knowledge obtaining module 2, and the graph model updating module 3 is used for updating the utterance

As new nodes are added to the graph model g _t (V _t ,E _t ) To obtain an updated graph model g _t ^* (V _t ,E _t )；

The speech feature updating module 4 is connected with the graph model updating module 3, and the speech feature updating module 4 is used for updating the graph model g by adopting a GNN model _t ^* (V _t ,E _t ) Obtaining updated speech features according to the speech features;

the speaking theme determining module 5 is used for determining the speaking element u based on the t time _i The first m utterances, determining the final t-time utterance subject f _t Wherein m is 1 … i-1, i is 1 … n;

the long-dialog emotion label determination module 6 and the utterance feature updating module 4A topic determination module 5 connected to said long-dialog emotion tag determination module for determining a topic f of said final t-time utterance based on said updated utterance feature _t Determining a long conversation emotion label; the long dialog emotion tag comprises: happy (happy), depressed (frustrated), surprised (surrise), anhedonic (neutral), fear (fear), disgust (distust), anger (anger).

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the description of the method part.

The invention provides a long-conversation emotion detection method and system based on a common sense knowledge graph, which not only keep the correlation between utterances, but also keep time change information. By searching each speech feature from the common sense knowledge graph, relevant prior knowledge is learned, global semantic information is obtained, the speech feature is guided to be updated, and the final performance of emotion detection is improved. The emotion is detected by using the structure of the transform decoder, so that the deep fusion of the speech characteristics containing the prior knowledge and the speech theme is realized, the potential dialogue semantic information and the prior knowledge are introduced, and the emotion recognition capability is improved.

The principle and the embodiment of the present invention are explained by applying specific examples, and the above description of the embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A long-conversation emotion detection method of a common sense knowledge graph, the long-conversation emotion detection method comprising:

speech features based on time t

utterances based on time t

Extracting utterances

A priori knowledge of;

combining the utterance

Updating graph model g with GNN model _t ^* (V _t ,E _t ) Obtaining updated speech features according to the speech features;

t moment based utterance element u _i The first m utterances, determining the final t-time utterance subject f _t Wherein, m is 1 … i-1, i is 1 … n;

2. According to claim 1The method for detecting the long-dialog emotion of the common sense knowledge graph is characterized in that the utterance at the t moment is based on

Extracting utterances

The prior knowledge of (a) is specifically:

based on the speech element u _i Retrieval result of (A) obtaining an utterance

The search result set of (2) is expressed as:

A priori knowledge of.

3. The method of claim 1, wherein the speech element u at the time t is based on _i The first m speech elements determine the final t-time speech subject f _t The method specifically comprises the following steps:

obtaining the speech element u at the time t _i The top m utterance elements;

the speech element u at the time t is divided into _i The first m speech elements are converted into m speech features;

3 XN differentSpeech element u at time t _i Inputting the first m speech elements into the self-attention mechanism layer to obtain N sample speech subjects f _t N positive sample utterance topics f _p And N negative sample utterance topics f _n (ii) a Wherein, every time the theme is extracted, the time interval is more than or equal to T _interval The N sample utterance topics f _t N positive sample utterance topics f _p And N negative sample utterance topics f _n Constituting a triplet (f) _t ,f _p ,f _n )；

determining a final t-time utterance topic f based on the loss function _t 。

4. The method of claim 1, wherein the method is based on the updated utterance features and the final t-time utterance topic f _t Determining the long conversation emotion label specifically as follows:

l isMaximum number of layers in the transform decoder structure;

the final speech characteristics are combined

Inputting the softmax function to obtain the long dialogue emotion label.

5. The method of claim 4, wherein one layer of the transform decoder structure comprises:

a self-attention module for fusing and updating speech features

Obtaining a first speech feature

Wherein k is 1 … l;

Obtaining second speech characteristics

The cross attention module is further configured to fuse and update the second spoken language feature

Deriving a third speech feature

Wherein k is 1 … l;

Obtaining an updated fourth utterance feature

6. The method of claim 5, wherein the self-attention module is configured to apply the following formula to the utterance feature

And (3) performing fusion and updating:

wherein,

is composed of

The variance of (c).

7. The method of claim 5, wherein the cross-attention module is configured to apply the following formula to the first utterance feature

And (3) performing fusion and updating:

wherein,

is the second weight parameter of the Attention Value that the k-th cross Attention module needs to learn, MultiHead is a multi-headed self-Attention function, Attention is a self-Attention function, softmax is a classification function,

is composed of

The variance of (a);

the cross attention module is specifically configured to apply the following formula to the second spoken language feature

And (3) performing fusion and updating:

wherein,

is composed of

The variance of (c).

8. The method of claim 5, wherein the feedforward neural network module is configured to apply the third utterance feature according to the following formula

Updating:

wherein,

9. A long-dialog emotion detection system of a common-sense knowledge graph, comprising:

a priori knowledge acquisition module for utterances based on t-time

Extracting utterances

A priori knowledge of;

As new nodes are added to the graph model g _t (V _t ,E _t ) Get the updated graph model g _t ^* (V _t ,E _t )；

A speech feature updating module connected with the graph model updating module and used for updating the graph model g by adopting the GNN model _t ^* (V _t ,E _t ) Obtaining updated speech features according to the speech features;

a speaking subject determining module for determining the speaking element u based on the t time _i The first m utterances determine the utterances subject f at the final t moment _t Wherein m is 1 … i-1, i is 1 …n；

10. The system for long-dialog emotion detection of a common sense knowledge base, according to claim 9, wherein the a priori knowledge acquisition module specifically comprises:

speech element u _i For obtaining the utterance element u _i The search result is the first k results in the triple format, and is expressed as:

words and phrases

The search result set of (2) is expressed as:

a priori knowledge acquisition unit, and the utterance

A priori knowledge of.