CN116484004A

CN116484004A - Dialogue emotion recognition and classification method

Info

Publication number: CN116484004A
Application number: CN202310607292.4A
Authority: CN
Inventors: 徐博; 李龙娇
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2023-05-26
Filing date: 2023-05-26
Publication date: 2023-07-25
Anticipated expiration: 2043-05-26
Also published as: CN116484004B

Abstract

The invention provides a dialogue emotion recognition and classification method, which comprises the steps of preprocessing dialogue emotion recognition data sets; extracting sentence-level features from the sentences in the preprocessed dialogue emotion recognition data set to obtain sentence-level features; extracting emotion-induced events from sentences in the preprocessed dialogue emotion recognition data set to obtain semantic representation of the emotion-induced events; constructing an emotion-psychology characterization heterogeneous conversation map through sentence level features of sentences and semantic representation of emotion-induced events, and obtaining a feature matrix of each node input feature and an adjacent matrix of the edge connection relationship between the nodes; extracting session-level features of the feature matrix and the adjacent matrix through a graph encoder to obtain session-level features; and the full-connection sentence-level features and the conversation-level features are input into a feedforward neural network to obtain emotion classification. The invention can better identify emotion types in the dialogue and improve the accuracy of dialogue emotion identification.

Description

Dialogue emotion recognition and classification method

Technical Field

The invention belongs to the technical field of intelligent recognition, and particularly discloses a dialogue emotion recognition and classification method.

Background

Natural language is used as a main communication medium of human beings and plays a very important role in the emotion generation process, so emotion recognition has wide application prospect in the fields of opinion mining, social media, recommendation systems and the like. In recent years, in order to develop artificial intelligence capable of understanding human emotion, conversational Emotion Recognition (ERC) has attracted attention from natural language processing researchers, and has become a popular research field. Conversational emotion recognition aims at recognizing the emotion of each utterance in a conversation, which helps to generate emotion-aware conversations, and develop a co-emotion conversation agent or chat robot for psychological treatment. In addition, ERC has potential applications in social media clues, forensic mining, consumer feedback in real-time conversations, legal decisions, electronic health services medical systems, education, and the like. Conversational emotion recognition differs from traditional text such as sentences, documents, etc., and requires not only semantic information of the utterances themselves, but also context modeling of each utterance. Preliminary studies of conversational emotion recognition, such as vocabulary-based and deep learning-based emotion recognition research methods, ignore specific factors of conversations, such as contextual cues, chronological order of utterances, or specific information of speakers. Recent studies on conversational emotion recognition use a sequence-based or graph-based approach to simulate interactions between the context of the utterance and the speaker as much as possible. Based on the above study, researchers have incorporated variables affecting dialogue emotion, such as general knowledge, psychological knowledge, dialogue behavior, topics, etc., into models.

While previous studies have made tremendous progress in conversational emotion recognition tasks, the effect on emotion recognition of psychological characterization that triggers the generation of emotion by the speaker is ignored. The method of modeling mental states using an external common sense knowledge base is largely dependent on the size, coverage and construction quality of the knowledge base. While the pre-training model helps to generate common sense knowledge of unseen events of the knowledge base, the common sense of everyone in real life is not necessarily the same. Psychologist Rainer Reisenzein studies language-emotion interactions from the psycho-characterization theory of computable emotions, knowing that emotion depends not only on cognitive psychological characterization (i.e., cognitive or information states) but also on motivational psychological characterization (i.e., motivational states). It can be said that the psycho-emotional characterization theory on which the computable psycho-emotional characterization theory is based is itself an explanation of the emotion common sense psychology underlying the language. Humans have complex psychological characterizations from which it is known in daily conversational communication that a speaker "a psychological characterization has just been confirmed by a certain event" or "a psychological characterization has just been achieved by a certain event", so that emotion for a transactional state results from the confirmation or non-confirmation of a psychological characterization and satisfaction or frustration of a psychological characterization. It can be said that psychological characterization and emotion are not only causal but also semantically related to emotion: p, craving for p, and pleasure about p are believed to be a common event, all of which involve p. Therefore, the emotion classification method for the sentences by the heterogeneous graph neural network containing sentences, speakers and emotion-induced events based on the emotion psychological characterization theory can better identify the emotion in the dialogue.

Disclosure of Invention

The invention provides a dialogue emotion recognition and classification method for solving the problems that the existing dialogue emotion recognition and classification method ignores the effect of psychological characterization which causes emotion generated by a speaker on emotion recognition and has low classification precision.

The invention provides a dialogue emotion recognition and classification method, which comprises the following steps:

s1, preprocessing a dialogue emotion recognition data set, removing irrelevant information of sentences in the dialogue emotion recognition data set, and improving the quality of texts;

s2, extracting sentence-level features of the sentences in the dialogue emotion recognition data set after preprocessing in the step S1 to obtain sentence-level features of the sentences;

s3, extracting emotion triggering events from the sentences in the dialogue emotion recognition data set preprocessed in the step S1 to obtain semantic representations of the emotion triggering events;

s4, constructing a emotion-psychology characterization heterogeneous session graph through the sentence level features of the sentences obtained in the step S2 and the semantic representation of the emotion-induced events obtained in the step S3, and obtaining a feature matrix of each node input feature and an adjacent matrix of the edge connection relationship between the nodes;

s5, extracting session-level features of the feature matrix and the adjacent matrix obtained in the step S4 through a graph encoder to obtain session-level features;

s6, fully connecting the sentence-level features obtained in the step S2 and the session-level features obtained in the step S5, and inputting the fully connected result into a feedforward neural network to obtain emotion classification.

Still further, in the step S1, the dialog emotion recognition data set includes one or more of IEMOCAP, dailyDialog, MELD and EmoryNLP; the preprocessing includes one or more of removing abbreviations, removing non-alphabetic symbols, removing special symbols, removing abbreviations for proper nouns, and removing redundant space characters.

Furthermore, in the step S2, sentence-level feature extraction is performed on the sentences in the dialog emotion recognition data set using a language model Roberta-Large, and each sentence u in the dialog emotion recognition data set _i Beginning of add mark [ CLS ]]Let the input sequence be [ CLS ]]，w ₁ ,w ₂ ，···，w _L And input into Roberta to obtain sentence-level featuresAs shown in formula (1):

wherein w is _L Representation sentence u _i Is the L-th word of (2).

Further, the language model Roberta-Large architecture used in the step S2 is 24 layers, each block has 16 self-care heads, the hidden dimension is 1024, and there are 355M parameters in total.

Further, the step S3 includes:

s301, dividing sentences in the dialogue emotion recognition data set into simple clauses according to subordinate conjunctions and conjunctions;

s302, designing a plurality of event modes to match and extract emotion guidingEvent, find sentence u _i The method comprises the steps of including the dependency relationship of simple clauses of each verb v, then matching the dependency relationship with a plurality of designed event modes one by one, and for each mode, taking the verb v as a starting point, finding all positive dependency relationship sides, wherein the positive dependency relationship sides and words connected by the positive dependency relationship sides are potential sides and words of effective emotion triggering events;

s303, adding optional dependency edges and words connected by the optional dependency edges through the dependency to form a dependency graph;

s304, checking whether a negative dependency relationship side can be found in the dependency graph, if not, reserving the current dependency relationship side and the word as effective emotion triggering events, otherwise, not reserving;

s305, encoding the extracted effective emotion-induced events by using RoBERTa, and obtaining the maximum pooling of the last layer of hidden states to obtain semantic representation of the emotion-induced eventsAs shown in formula (2):

where e is a valid emotion-inducing event.

Further, the step S4 includes:

s401, constructing statement nodes, speaker nodes and emotion triggering event nodes of an emotion-psychological characterization heterogeneous session map; each target sentence in the dialogue is used as a sentence node, and the characteristics of the sentence node are initializedIs a statement level feature of a statement, as shown in equation (3):

each speech in the dialogueWith a speaker as a speaker node, feature initialization of the speaker nodeThe average of the semantic features of all the expressed sentences of the speaker in the conversation is shown as formula (4):

wherein avg () is an averaging function;

using the emotion-induced event extracted from each sentence in the dialogue as an emotion-induced event node, and initializing the characteristics of the emotion-induced event nodeAs shown in formula (5):

the node set is shown in formula (6):

V＝u _i ∪Unique(s _j )∪Unique(e _y ) (6)

wherein V is a node set, u _i Is the ith statement node, s _j For the j-th speaker node, e _y For the y-th emotion-induced event node, unique () is a deduplication function;

s402, constructing statement-statement edges, statement-speaker edges and statement-emotion triggering event edges of emotion-psychological representation heterogeneous session diagrams; connecting each target sentence with the last sentence of all speakers before it, the sentence-sentence edge modeling the effect of past sentences on the current sentence, the sentence-sentence edge E _uu As shown in formula (7):

E _uu ＝(u _i ，u _t )，t＞i (7)

wherein u is _i For the ith statement node, u _t Is the t statement node;

concatenating each target statement with theA speaker corresponding to a sentence, the sentence-speaker edge modeling the effect of the speaker on the sentence, the sentence-speaker edge E _su As shown in formula (8):

E _su ＝(s _j ，u _i ) (8)

wherein s is _j For the j-th speaker node, u _i Is the ith statement node;

connecting each target sentence with an emotion-induced event extracted from the sentence, the sentence-emotion-induced event edge modeling the influence of psychometric information of the emotion-induced event on the emotion, the sentence-emotion-induced event edge being as shown in formula (9):

E _eu ＝(e _t ，u _i ) (9)

wherein e _t For the t-th emotion-induced event node, u _i Is the ith statement node;

the edge set is shown in formula (10):

E＝E _uu ∪E _su ∪E _eu (10)

wherein E is an edge set;

s403, after constructing the heterogeneous session graph, obtaining a feature matrix X representing the input features of each node and an adjacency matrix { A } of the connection relationship of the edges between the nodes _k The feature matrix X is an N X d-dimensional matrix formed by the features of each node, N is the number of all nodes, d is the feature vector dimension of each node, and the adjacent matrix { A } _k N x N dimensional matrix set representing edge relationships between nodes, A _k Is the adjacency matrix of the kth dependency edge.

Further, the step S5 includes:

s501, flexibly selecting an adjacent matrix from an adjacent matrix set A of a heterogeneous graph G by using a layer-I graph conversion layer, and passing through two selected adjacent matrices A ₁ And A ₂ Learning a new primitive path graph, learning different node representations through a plurality of different heterogeneous graph G structures, learning a plurality of primitive path graphs after stacking l-layer graph conversion layers, and performing graph convolution neural on each primitive path graphThe network carries out graph convolution, and the propagation mode between layers of the graph convolution neural network is shown as a formula (11):

wherein X is ^(l+1) For the feature matrix of layer l +1, sigma is a nonlinear activation function,a is an adjacent matrix, I is an identity matrix,>is->W is a trainable weight matrix sharing a cross-channel, W ε R ^d×d Is a real matrix of d x d dimensions, d being the feature vector dimension of each node;

s502, fully connecting a plurality of node representations from the same graph convolution neural network on a plurality of element path graphs to obtain session-level features of sentence nodes, wherein the session-level features are shown in a formula (12):

wherein H is a session level feature, ||is a full connection operation, C is the number of channels,is from->Adjacent matrix of the ith channel, +.>For the adjacency matrix of the first layer, +.>Is->W is a trainable weight matrix sharing a cross-channel.

Further, in the step S6, the sentence-level feature and the session-level feature of the sentence node are fully connected, and the formula (13) is shown as follows:

wherein, the I is the full connection operation,for statement u _i Statement level features of H _i For statement u _i Is a session-level feature of (c),

inputting the result obtained by full connection into a feedforward neural network, training and optimizing the feedforward neural network by using a cross entropy loss function and an Adam optimizer, and finally obtaining an emotion classification result, wherein the emotion classification result is shown as a formula (14) and a formula (15):

p _x，i ＝Softmax(W _z z _i +b _z ) (14)

y _x，i ＝Argmax(p _x，i ) (15)

wherein y is _x，i Is the actual label of the ith statement in dialog x, z _i For the last statement representation, W _z And b _z As a trainable parameter, p _x，i Is the predictive probability distribution for the emotion tags of the ith sentence in dialog x.

According to the dialogue emotion recognition and classification method provided by the invention, semantic information of sentences, context and sequence information of dialogue, global information of a speaker and psychological characterization related to emotion-induced events are modeled, the effect of the psychological characterization of the emotion-induced speaker on emotion recognition is considered, so that the emotion category of dialogue sentences can be better recognized, and the accuracy of dialogue emotion recognition is improved.

Drawings

Fig. 1 is a flow chart of a dialog emotion recognition and classification method according to the present invention.

Detailed Description

Embodiments of the present invention are described in further detail below with reference to the accompanying drawings and examples. The following examples are illustrative of the invention but are not intended to limit the scope of the invention.

A dialogue emotion recognition and classification method is shown in fig. 1, and comprises the following steps:

s1, preprocessing a dialogue emotion recognition data set, removing irrelevant information of sentences in the dialogue emotion recognition data set, and improving the quality of a text;

in particular, the dialog emotion recognition dataset includes one or more of IEMOCAP, dailyDialog, MELD and EmoryNLP; the preprocessing includes one or more of removing abbreviations, removing non-alphabetic symbols, removing special symbols, removing abbreviations for proper nouns, and removing redundant space characters.

S2, extracting sentence-level features of the sentences in the dialogue emotion recognition data set preprocessed in the step S1 to obtain sentence-level features of the sentences;

specifically, the RoBERTa-Large is used for extracting sentence-level features of sentences in the dialogue emotion recognition data set, and the method is specifically implemented as follows: the sentences in the dialogue emotion recognition data set are stored in dictionary types, and text and speaker are respectively corresponding to sentence texts and corresponding speaker names in keys of each sentence in the dialogue. Each sentence u in the dialog emotion recognition dataset using the language model Roberta-Large _i Beginning of add mark [ CLS ]]Let the input sequence be [ CLS ]]，w ₁ ,w ₂ ，···，w _L Inputting the semantic level representation of the Roberta-Large extraction statement by using a transformer library of Huggingface, and performing word segmentation pretreatment by using a RobertaTokenizer class and a from_preimpregnated () method to return a PyTorch tensor; output using RobertaModel class and from_preimpregnated method to obtain [ CLS ]]The sentence-level features serving as sentences are embedded in the pooling of 1024-dimensional hidden states of the last layer to obtain the sentence-level featuresAs shown in formula (1):

wherein w is _L Representation sentence u _i Is the L-th word of (2).

The language model Roberta-Large architecture used is 24 layers, with 16 self-attention heads in each block, a hidden dimension of 1024, and a total of 355M parameters.

S3, extracting emotion-induced events from sentences in the dialogue emotion recognition data set preprocessed in the step S1 to obtain semantic representation of the emotion-induced events;

speech communication is the primary source of general and specific background psychological characterizations required to calculate a psychological characterization of a specific emotion-causing event, which is semantically related to emotion, and thus extracting the causing emotional event from the target utterance to determine the confirmation or non-confirmation of the psychological characterization and satisfaction or frustration of the psychological characterization. To ensure that all extracted emotion-induced events are semantically complete and not overly complex, 18 designed event patterns are used to extract emotion-induced events by pattern matching, the 18 event patterns are shown in table 1:

TABLE 1 18 event patterns

Each pattern contains three dependency edges: positive dependency edges, optional dependency edges, and negative dependency edges. The other six dependencies advmod, amod, nummod, aux, compound and neg are optional dependency edges that can be associated with any selected pattern. All positive and optional dependency edges are considered negative dependency edges, with the aim that all emotion-induced events extracted are semantically complete and all patterns are mutually exclusive independent of each other.

Specifically, step S3 includes:

s301, considering that a sentence possibly contains a plurality of emotion triggering events, dividing the sentence into simple clauses according to a component tree, and following an utterance analysis system to detect possible separators by using a connection classifier, thereby dividing the sentence in the dialogue emotion recognition data set into simple clauses according to subordinate conjunctions and conjunctions;

s302, designing a plurality of event modes to match and extract emotion-induced events, and finding a sentence u _i The method comprises the steps of including the dependency relationship of simple clauses of each verb v, then matching the dependency relationship with a plurality of designed event modes one by one, and for each mode, taking the verb v as a starting point, finding all positive dependency relationship sides, wherein the positive dependency relationship sides and words connected by the positive dependency relationship sides are potential sides and words of effective emotion triggering events;

where e is a valid emotion-inducing event.

S4, constructing an emotion-psychology characterization heterogeneous session graph through the sentence level features of the sentences obtained in the step S2 and the semantic representation of the emotion-induced events obtained in the step S3, and obtaining a feature matrix of each node input feature and an adjacent matrix of the edge connection relationship between the nodes;

specifically, step S4 includes:

each speaker in the conversation is used as a speaker node, and the characteristics of the speaker node are initializedThe average of the semantic features of all the expressed sentences of the speaker in the conversation is shown as formula (4):

wherein avg () is an averaging function;

the node set is shown in formula (6):

V＝u _i ∪Unique(s _j )∪Unique(e _y ) (6)

s402, constructing statement-statement edges, statement-speaker edges and statement-emotion triggering event edges of emotion-psychological representation heterogeneous session diagrams; we consider that the last sentence of each speaker before the target sentence has the greatest effect on the context of the target sentence, and that the other effects are smaller, and furthermore, it is notable that the edges between sentences are unidirectional, and that in real life, the emotion expressed by the current utterance is only affected by the previous utterance. So when constructing a statement-statement edge, connect each target statement with the last statement of all speakers before it, the statement-statement edge models the effect of past statements on the current statement, statement-statement edge E _uu As shown in formula (7):

E _uu ＝(u _i ，u _t )，t＞i (7)

wherein u is _i For the ith statement node, u _t Is the t statement node;

connecting each target sentence and the corresponding speaker of the sentence, modeling the influence of the speaker on the sentence by the sentence-speaker side, and modeling the effect of the speaker on the sentence-speaker side E _su As shown in formula (8):

E _su ＝(s _j ，u _i ) (8)

wherein s is _j For the j-th speaker node, u _i Is the ith statement node;

connecting each target sentence with the emotion-induced event extracted from the sentence, and modeling the influence of psychological characterization information of the emotion-induced event on the emotion by the sentence-emotion-induced event edge, wherein the sentence-emotion-induced event edge is shown in a formula (9):

E _eu ＝(e _t ，u _i ) (9)

the edge set is shown in formula (10):

E＝E _uu ∪E _su ∪E _eu (10)

wherein E is an edge set;

specifically, step S5 includes:

s501, constructing Graph Transformer Network by using PyToch frame, setting C convolved output channels to simultaneously consider multiple element paths, flexibly selecting an adjacent matrix from an adjacent matrix set A of a heterogeneous graph G by using a layer-I graph conversion layer, and passing through two selected adjacent matrices A ₁ And A ₂ Learning a new element path diagram, learning different node representations through a plurality of different heterogeneous diagram G structures, learning a plurality of element path diagrams after stacking l layers of diagram conversion layers, and performing diagram convolution on a diagram convolution neural network on each element path diagram, wherein the propagation mode between the layers of the diagram convolution neural network is shown as a formula (11):

Specifically, the full connection statement level feature and the session level feature of the statement node are shown in formula (13):

p _x，i ＝Softmax(W _z z _i +b _z ) (14)

Y _x，i ＝Argmax(p _x，i ) (15)

Training the training set data by using the method of the embodiment, verifying the test results of the verification set and the test set, and comparing the method of the embodiment with a sequence-based dialog emotion recognition and classification method DialogueRNN, a graph-based dialog emotion recognition and classification method RGAT-POS, DAG-ERC, and common sense knowledge-based dialog emotion recognition algorithms COSMIC, SKAIG-ERC to obtain the results of table 2:

table 2 comparison of experimental results of different algorithms

The evaluation of the performance of the method on the data sets IEMOCAP and EmoryNLP using weighted-F1 as the evaluation index, with higher weighted-F1 values indicating better classification of the samples by the method, the method of this example achieved better results than DialogueRNN, RGAT-POS, DAG-ERC, COSMIC, SKAIG-ERC on both data sets at IEMOCAP, emoryNLP weighted-F1 values, 0.19 higher on the data set IEMOCAP than the best algorithm DAG-ERC, and 1.3 higher on the data set EmoryNLP than the best algorithm DAG-ERC. Since DailyDialog data sets contain a large amount of data with neutral labels, the evaluation of the method performance is performed on DailyDialog by using micro-F1 excluding sentences marked as neutral as an evaluation index, micro-F1 is a proportion of calculating correct classification in all samples, the higher the micro-F1 value is, the better the effect of the method on classifying the samples is, the micro-F1 value on the DailyDialog of the method in the embodiment obtains a result superior to DialogueRNN, RGAT-POS and DAG-ERC, COSMIC, SKAIG-ERC, and the best algorithm SKAIG-ERC on the DailyDialog is 0.05 higher. Therefore, it can be seen that the method provided by the embodiment can obtain better classification effect, i.e. the emotion recognition and classification result is more accurate.

The method of the embodiment is inspired by a psychological characterization theory of emotion in psychology, takes the effect of psychological characterization on emotion into consideration, preprocesses dialogue data, extracts sentence-level features by using a pre-training model, extracts emotion-induced events in the sentences, constructs the sentences, speakers and emotion-induced events in the sentences into context interaction, sequence information, speaker information and psychological characterization information of emotion-induced events of a heterogeneous graph modeling dialogue, extracts the sentence-level features by using a graph transformer network, and finally classifies the emotion by combining the sentence-level features and the conversation-level features of the sentences, thereby realizing emotion recognition of the sentences in the dialogue. Semantic information of sentences, context and sequence information of conversations, specific information of speakers and psychological characterization related to emotion-induced events are modeled, emotion types of the sentences in the conversations can be better identified, and accuracy of conversation emotion identification results is guaranteed.

The embodiments of the invention have been presented for purposes of illustration and description, and are not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A dialog emotion recognition and classification method, characterized by comprising the steps of:

s1, preprocessing a dialogue emotion recognition data set, and removing irrelevant information of sentences in the dialogue emotion recognition data set;

2. A method of classifying a dialog emotion recognition as claimed in claim 1, wherein in step S1, the dialog emotion recognition data set includes one or more of IEMOCAP, dailyDialog, MELD and EmoryNLP; the preprocessing includes one or more of removing abbreviations, removing non-alphabetic symbols, removing special symbols, removing abbreviations for proper nouns, and removing redundant space characters.

3. A conversation as claimed in claim 1The method for classifying the emotion recognition is characterized in that in the step S2, sentence-level feature extraction is performed on sentences in the dialogue emotion recognition data set by using a language model Roberta-Large, and each sentence u in the dialogue emotion recognition data set is extracted _i Beginning of add mark [ CLS ]]Let the input sequence be [ CLS ]]，w ₁ ,w ₂ ，···，w _L And input into Roberta to obtain sentence-level featuresAs shown in formula (1):

wherein w is _L Representation sentence u _i Is the L-th word of (2).

4. A dialog emotion recognition and classification method according to claim 3, characterized in that the language model Roberta-Large architecture used in step S2 is 24 layers, each block has 16 self-attention heads, the hidden dimension is 1024, and there are 355M parameters in total.

5. A dialog emotion recognition and classification method in accordance with claim 3, characterized in that said step S3 comprises:

where e is a valid emotion-inducing event.

6. The method of claim 5, wherein the step S4 includes:

wherein avg () is an averaging function;

the node set is shown in formula (6):

V＝u _i ∪Unique(s _j )∪Unique(e _y ) (6)

E _uu ＝(u _i ，u _t )，t＞i (7)

wherein u is _i For the ith statement node, u _t Is the t statement node;

connecting each target sentence with the speaker corresponding to the sentence, the sentence-speaker side modeling the speakerInfluence on sentence, the sentence-speaker edge E _su As shown in formula (8):

E _su ＝(s _j ，u _i ) (8)

wherein s is _j For the j-th speaker node, u _i Is the ith statement node;

E _eu ＝(e _t ，u _i ) (9)

the edge set is shown in formula (10):

E＝E _uu ∪E _su ∪E _eu (10)

wherein E is an edge set;

7. The method of classifying dialogue emotion recognition according to claim 6, wherein said step S5 includes:

s501, flexibly selecting an adjacent matrix from an adjacent matrix set A of a heterogeneous graph G by using a layer-I graph conversion layer, and passing through two selected adjacent matrices A ₁ And A ₂ Learning a new element path diagram, learning different node representations through a plurality of different heterogeneous diagram G structures, and after stacking l-layer diagram conversion layers, learning a plurality of element path diagramsPerforming graph convolution on each element path graph by using a graph convolution neural network, wherein the propagation mode between layers of the graph convolution neural network is as shown in a formula (11):

8. The method of claim 7, wherein in the step S6, the sentence-level features and the conversation-level features of the sentence nodes are fully connected, and formula (13) is shown as follows:

p _x，i ＝Softmax(W _z z _i +b _z ) (14)

Y _x，i ＝Argmax(p _x，i ) (15)