CN114281954A - Multi-round dialog reply generation system and method based on relational graph attention network - Google Patents

Multi-round dialog reply generation system and method based on relational graph attention network Download PDF

Info

Publication number
CN114281954A
CN114281954A CN202111044215.XA CN202111044215A CN114281954A CN 114281954 A CN114281954 A CN 114281954A CN 202111044215 A CN202111044215 A CN 202111044215A CN 114281954 A CN114281954 A CN 114281954A
Authority
CN
China
Prior art keywords
representation
information
turn
utterance
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111044215.XA
Other languages
Chinese (zh)
Inventor
林菲
钱朝辉
张聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202111044215.XA priority Critical patent/CN114281954A/en
Publication of CN114281954A publication Critical patent/CN114281954A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention belongs to the field of computer artificial intelligence generated by natural language, and discloses a multi-round dialog reply generation system and method based on a relational graph attention network. The method comprises the following steps: acquiring multi-turn conversation input content for preprocessing, acquiring semantic information representation of each turn of the speeches, and encoding the semantic information of each turn of the speeches so as to obtain semantic representation of conversation context; then capturing autocorrelation in multiple rounds of conversations and correlation characteristics among the interlocutors by adopting a graph attention network, and introducing a relationship position code in the graph attention network to explain sequence information containing the utterances, so as to obtain a high-level semantic representation of a graph coding layer; and finally, taking the dialogue context semantic information representation and the high-level semantic representation of the relation graph attention coding as input, and decoding by using a GRU model to generate a final dialogue reply output representation. The invention obviously improves the quality of the multi-turn dialog reply generation, so that the generated reply is more coherent and meaningful.

Description

Multi-round dialog reply generation system and method based on relational graph attention network
Technical Field
The invention belongs to the field of computer artificial intelligence generated by natural language, and particularly relates to a multi-round dialog reply generation system and method based on a relational graph attention network.
Background
With the explosion of the internet and the rapid development of social media, a large amount of user dialogue corpora are generated, which provides conditions for a data-driven dialogue system. The enormous research and commercial value of intelligent dialog systems is receiving more and more research attention from both academic and industrial circles. At present, a dialog system can be divided into a task-driven limited-domain dialog system and an open-domain dialog system without a specific task, and compared with the former, the latter has the characteristics of better practicability, expandability, domain adaptability and the like, so that the open-domain dialog system gradually becomes a focus of attention of researchers.
Currently, the methods implemented according to the system can be divided into a search model and a generation model. The retrieval model adopts a selection algorithm to select proper replies from the dialogue corpus, and although the replies have the characteristics of correct grammar, objective fact and the like, the problems of reply sentence singleness, topic limitation and the like exist. The generative model is different from the generative model, and the natural language processing technology is adopted to learn and understand the context information input by the user and then gradually generate the corresponding reply words. Meanwhile, the generative model may be divided into a single-round dialogue generative model and a multi-round dialogue generative model according to whether or not the history dialogue information is considered. Compared with a single-round dialog generation model, a multi-round dialog generation model requires a system to have the ability to understand complex context information, and thus is more challenging. However, the research in the multi-turn dialogue generation model field in the current open field still has many difficulties, such as general reply, lack of background knowledge, lack of consistency, and the like. Therefore, the improvement of the open-field multi-turn dialog generation system has great research value.
Recent researchers in this field mainly develop research work based on sequence-to-sequence frameworks, and the research content mainly focuses on how to effectively model the problem of context information. However, previous research work rarely considers utterance dependencies between interlocutors and timing information. To solve this problem, how to model the speech dependency between the interlocutors is the key, and it is also necessary to pay attention to the timing information of the interlocutor speech. The invention adopts a relational graph attention network algorithm to model the dependence relationship of the words among the interlocutors, and fully excavates the information among different relationship types of the words sent by the interlocutors; meanwhile, the relation position coding is introduced to capture sequence information contained between the utterances of the interlocutors, so that the generated reply is more coherent, natural and specific.
Disclosure of Invention
The present invention provides a system and a method for generating a multi-turn dialog reply based on a graph attention network, so as to solve the above technical problems.
In order to solve the technical problems, the specific technical scheme of the multi-round dialog reply generation system and method based on the attention network of the relational graph is as follows:
a multi-round dialog reply generation system based on a relational graph attention network comprises a sentence input coding layer, a graph coding layer and a decoding layer, wherein the sentence input coding layer comprises a word level encoder and a speech level encoder; the word-level encoder encodes words in each turn of the utterance input by the model, thereby obtaining a semantic representation of the turn of the utterance itself; the utterance level encoder encodes the utterance's own semantic representation of the model, thereby obtaining a semantic representation of the entire dialog context; the graph coding layer firstly captures autocorrelation of the words in multiple rounds of conversations and correlation characteristics among the interlocutors by adopting a graph attention network, and secondly introduces a relation position code to explain sequence information containing the words; the decoding layer generates a response reply based on the input contextual semantic information representation and the high level semantic representation of the graph encoding layer.
The invention also discloses a multi-round dialog reply generation method based on the relational graph attention network, which comprises the following steps:
the method comprises the following steps: acquiring multi-turn dialog input contents for preprocessing, converting word meaning information in each turn of the utterances into corresponding vector representation through a pre-trained BERT model so as to obtain semantic information representation of each turn of the utterances, and encoding the word meaning information of each turn of the utterances through a Bi-GRU model so as to obtain semantic representation of a dialog context;
step two: capturing autocorrelation in multiple rounds of conversations and correlation characteristics among interlocutors by adopting a graph attention network, and introducing relationship position coding in the graph attention network to explain sequence information containing utterances, so as to obtain a high-level semantic representation of a graph coding layer;
step three: the dialog context semantic information representation and the high-level semantic representation of the relational graph attention coding are used as input, and a GRU model is used for decoding to generate a final dialog reply output representation.
Further, in the first step, in the preprocessing of the input content of the multi-turn dialog, the obtaining of the semantic information representation process of each turn of the utterance itself includes: the method comprises the steps of coding words in each input round of words, firstly, performing sequence labeling representation on each round of words by adopting a BPE algorithm, and then inputting the words into a pre-training language BERT model for fine-tuning learning training, so that semantic information representation of each round of words is obtained.
Further, step 1 encodes with a word-level encoder and a speech-level encoder; the method comprises the following specific steps:
the word-level encoder sets { U } for a given length M of a multi-turn dialog context U1,...,uMFirstly, each round of utterances is subjected to sequence marking representation by adopting BPE algorithm
Figure BDA0003250570100000021
Wherein T isiMarking the number of the i-th turn of utterances, and inputting the number of the i-th turn of utterances into a pre-training language BERT model for fine-tuning learning training, wherein the encoding process of the word-level encoder is represented by the following formula:
Figure BDA0003250570100000022
thereby obtaining semantic information representation of each turn of speech
Figure BDA0003250570100000023
A Bi-GRU model is adopted as a coding mode of a speech level coder; the Bi-GRU model firstly obtains the self semantic representation of each turn of utterances from the upper layer word level encoder
Figure BDA0003250570100000031
As input, each round of utterances is encoded by a Bi-GRU model, the utterance-level encoder encoding process being represented by the following formula:
Figure BDA0003250570100000032
Figure BDA0003250570100000033
wherein
Figure BDA0003250570100000034
For the i-th hidden layer representation in the forward GRU,
Figure BDA0003250570100000035
is represented by the ith hidden layer in the backward GRU; the hidden layer representations of the forward GRU and the backward GRU are spliced to obtain semantic representation containing context information
Figure BDA0003250570100000036
High-level feature information between multiple turns of conversational utterances is captured through a hierarchical structure.
Further, the obtaining of the high-level semantic representation of the graph coding layer in the second step includes the following steps:
constructing a directed graph for M sentences in a multi-turn conversation and defining the following steps:
Figure BDA0003250570100000037
defining each sentence in a plurality of rounds of conversations as a node vi
Figure BDA0003250570100000038
The relationship dependency information between each statement is defined as an edge r,
Figure BDA0003250570100000039
wherein
Figure BDA00032505701000000310
The weight of an edge is defined as alphaijr
Figure BDA00032505701000000311
(1) Firstly, the context semantic output by the context coding layer is expressed
Figure BDA00032505701000000312
As node viAn initial vector representation of;
(2) constructing an information edge r based on the nodes, and carrying out the following 4 differentiation definitions on the type of the information edge r: (a) self-before type edge r1Target utterance and the information of the utterance relation dependency type before the utterer of the utterance; (b) inter-before type edge r2Target speech and previous speech relation dependency type information except the person who sends the speech; (c) self-after type edge r3Target utterance and utterance relation dependency type information behind the utterance speaker; (d) inter-after type edge r4Target utterance and subsequent utterance relationship dependency type information other than the utterance speaker;
(3) capturing timing information between the four types of information side utterances using a method of Relational Position encoding (Relational Position Encodings); the relational position coding process is expressed by the following formula:
Figure BDA00032505701000000313
wherein PEijrRepresenting a target utterance u under a relationship type riIts neighboring words ujThe maximum relation value is between [ b, a]B and a are sliding window values of the target speech and other speech,
Figure BDA00032505701000000314
representing a target utterance u under a relationship type riA neighborhood of (c);
(4) and calculating the weight of the related information edge, wherein the formula is as follows:
Figure BDA00032505701000000315
wherein alpha isijrRepresenting a target utterance u under a relationship type riIts neighboring words ujEdge weight value between, WrFor the parameterized weighting matrix in the attention mechanism, arFor parameterized weight vector, ·TRepresenting transposition, and LRL is LeakyReLU activation function;
(5) by aggregating neighborhoods
Figure BDA0003250570100000041
To update the vector representation of each node
Figure BDA0003250570100000042
The graph propagation mechanism encoding process is represented by the following formula:
Figure BDA0003250570100000043
Figure BDA0003250570100000044
wherein
Figure BDA0003250570100000045
For trainable parameter weighting matrix, L is convolution layer number, and finally high-level semantic representation of image coding layer is obtained through output
Figure BDA0003250570100000046
Further, the third step includes the following steps:
the GRU model is adopted as a decoder to generate the reply, and the decoding process of the decoder is represented by the following formula:
Figure BDA0003250570100000047
Figure BDA0003250570100000048
wherein s is0For the initialization input of the decoder, WeAnd beIn order to train the parameters, the user may,
Figure BDA0003250570100000049
a concatenated representation of the last hidden layer for forward and backward GRUs in the speech-level encoder; stFor the hidden layer representation of the decoder at time t, e (r)t-1) The word vector representation of the word is output for time t-1,
Figure BDA00032505701000000410
high-level semantic representation output at an L layer for a t-1 moment graph coding layer;
finally, according to high-level semantic information of image coding layer
Figure BDA00032505701000000411
Combined with a hidden layer representation s of the decoder at time ttThe output of the current time is predicted and is represented by the following formula:
Figure BDA00032505701000000412
wherein Wo,boFor trainable parameters, p represents the probability of generating a word at the current time;
replying to R ═ R with a given real conversation1,r2,...,rT]For training the target, a cross entropy loss function is used
Figure BDA00032505701000000413
Training model parameters:
Figure BDA00032505701000000414
the multi-round dialog reply generation system and method based on the relation graph attention network have the following advantages that:
1. compared with the traditional single-round dialogue reply generation, the multi-round dialogue reply generation method can capture high-level feature information among multi-round dialogue utterances through a hierarchical structure, so that the generated reply information has higher correlation and more diversity.
2. The invention adopts the pre-training language model BERT to learn a better text characteristic through a deep model, thereby effectively solving the problem of word ambiguity.
3. The method uses a relational graph attention network model to capture the interdependence relation among text sequences by constructing nodes, edges and corresponding topological structures, thereby further extracting potential feature representation.
Drawings
FIG. 1 is a block diagram of a system for generating a multi-turn dialog reply based on a graph attention network according to the present invention;
FIG. 2 is a conceptual diagram of the relationship location of the present invention;
FIG. 3 is a diagram of the process of position encoding for four different relationship types of the present invention.
Detailed Description
In order to better understand the purpose, structure and function of the present invention, a multi-turn dialog reply generation system and method based on a graph attention network according to the present invention will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1, the method for generating a multi-turn dialog reply based on a relational graph attention network of the present invention includes the following steps:
the method comprises the following steps: acquiring multi-turn dialog input contents for preprocessing, converting word meaning information in each turn of the utterances into corresponding vector representation through a pre-trained BERT model so as to obtain semantic information representation of each turn of the utterances, and encoding the word meaning information of each turn of the utterances through a Bi-GRU model so as to obtain semantic representation of a dialog context;
step two: capturing autocorrelation in multiple rounds of conversations and correlation characteristics among interlocutors by adopting a graph attention network, and introducing relationship position coding in the graph attention network to explain sequence information containing utterances, so as to obtain a high-level semantic representation of a graph coding layer;
step three: the dialog context semantic information representation and the high-level semantic representation of the relational graph attention coding are used as input, and a GRU model is used for decoding to generate a final dialog reply output representation.
As a preferred embodiment of the present invention, in the first step, in preprocessing input contents of multiple rounds of dialog, obtaining semantic information representing process of each round of utterance itself includes: the method comprises the steps of coding words in each input round of words, firstly, performing sequence labeling representation on each round of words by adopting a BPE algorithm, and then inputting the words into a pre-training language BERT model for fine-tuning learning training, so that semantic information representation of each round of words is obtained.
The invention discloses a relational graph attention network-based multi-round dialogue reply generation system framework schematic diagram, which is shown in FIG. 1, wherein the whole framework comprises the following parts: a sentence input encoding layer, a picture encoding layer, and a decoding layer. The various components of the system will now be described in detail:
(1) sentence input coding layer
The sentence input coding layer comprises two different level coders: a word-level encoder and a speech-level encoder. Both encoders are described in detail below.
A word level encoder: the encoder is intended to encode the words in each turn of the utterance input by the model, so as to obtain a semantic representation of the turn of the utterance itself. For a given length M of the multi-turn dialog context U ═ U1,...,uMFirstly, in order to fully extract the information expressed by the user words, the model adopts BPE algorithm for each wordSequence tagged representation of a round-robin utterance
Figure BDA00032505701000000614
Wherein T isiThe number of the i-th round of utterances is marked, and then the number of the i-th round of utterances is input into a pre-training language BERT model for fine-tuning learning training. The encoder encoding process is represented by the following formula:
Figure BDA0003250570100000061
thereby obtaining semantic information representation of each turn of speech
Figure BDA0003250570100000062
Speech level encoder: the encoder is intended to encode the utterance's own semantic representation of the model, thus obtaining a semantic representation of the entire dialog context. The Bi-GRU model is used herein as the encoding mode of the speech level encoder. The model firstly obtains the self semantic representation of each turn of utterances from the upper-layer word-level encoder
Figure BDA0003250570100000063
As input, each round of utterances is encoded by a Bi-GRU. The encoder encoding process is represented by the following formula:
Figure BDA0003250570100000064
Figure BDA0003250570100000065
wherein
Figure BDA0003250570100000066
For the i-th hidden layer representation in the forward GRU,
Figure BDA0003250570100000067
is a rear directionThe ith hidden layer in the GRU. The hidden layer representations of the forward GRU and the backward GRU are spliced to obtain semantic representation containing context information
Figure BDA0003250570100000068
High-level feature information between multiple turns of conversational utterances is captured through a hierarchical structure.
(2) Layer of picture coding
The encoding layer first captures auto-correlation of utterances in multiple rounds of dialog and correlation features between speakers using a graphical attention network. In addition, the model introduces a completely new position code (namely, a relation position code) in the attention network to explain the sequence information containing the speech.
Different from the traditional multi-turn dialogue reply generation research work, the method constructs a directed graph for M sentences in the multi-turn dialogue and defines the following steps:
Figure BDA0003250570100000069
defining each sentence in a plurality of rounds of conversations as a node vi
Figure BDA00032505701000000610
The relationship dependency information between each statement is defined as an edge r,
Figure BDA00032505701000000611
wherein
Figure BDA00032505701000000612
The weight of an edge is defined as alphaijr
Figure BDA00032505701000000613
The node representation, edge type definition, relationship location coding, edge weight representation, and graph propagation mechanism in the graph will be described in detail below.
In the graph attention network, the model firstly represents the context semantic output by the context coding layer
Figure BDA0003250570100000071
As node viIs represented by the initial vector of (a). As shown in fig. 1, in order to fully show the different relationship dependency information between utterances, the information edge shown in the network coding layer is illustrated, and the type of the information edge r is defined by 4 differentiation as follows: (a) self-before type edge r1Target utterance and the information of the utterance relation dependency type before the utterer of the utterance; (b) inter-before type edge r2Target speech and previous speech relation dependency type information except the person who sends the speech; (c) self-after type edge r3Target utterance and utterance relation dependency type information behind the utterance speaker; (d) inter-after type edge r4A target utterance and subsequent utterances other than the utterer of the utterance are related to the dependency type information. Furthermore, a method of Relational Position Encodings is proposed herein to capture the temporal information between these four types of information-side utterances. Unlike previous absolute position coding and relative position coding, which codes under a relationship type based on the relative distance between utterances, fig. 2 illustrates the concept of a relationship position, where the third row illustrates different background colors, i.e. different types representing information edges, in the relationship position. Then, position coding is performed based on the four different relationship types, and the coding information is added to the weight of the edge, and fig. 3 shows the coding process. Thus, the relational position encoding process can be represented by the following formula:
Figure BDA0003250570100000072
wherein PEijrRepresenting a target utterance u under a relationship type riIts neighboring words ujThe maximum relation value is between [ b, a]B and a are sliding window values of the target speech and other speech,
Figure BDA0003250570100000073
is represented in relation typer target utterance uiOf the neighborhood of (c). Inspired by the attentive network model, combined with the above-mentioned relational position coding, the weights of the edges are thus defined by the following equations:
Figure BDA0003250570100000074
wherein alpha isijrRepresenting a target utterance u under a relationship type riIts neighboring words ujEdge weight value between, WrFor the parameterized weighting matrix in the attention mechanism, arFor parameterized weight vector, ·TRepresenting transpose, LRL is the LeakyReLU activation function.
The graph coding layer finally passes the aggregation neighborhood
Figure BDA0003250570100000075
To update the vector representation of each node
Figure BDA0003250570100000076
The graph propagation mechanism encoding process is represented by the following formula:
Figure BDA0003250570100000077
Figure BDA0003250570100000078
wherein
Figure BDA0003250570100000079
L is the number of convolution layers for the trainable parametric weighting matrix. Finally, high-level semantic representation of the image coding layer is obtained through output
Figure BDA0003250570100000081
(3) Decoding layer
The decoding layer is intended to generate a response reply based on the input contextual semantic information representation and the high level semantic representation of the graph coding layer. The GRU model is employed herein as a decoder to generate the reply. The decoder decoding process is represented by the following equation:
Figure BDA0003250570100000082
Figure BDA0003250570100000083
wherein s is0For the initialization input of the decoder, WeAnd beIn order to train the parameters, the user may,
Figure BDA0003250570100000084
a concatenated representation of the last hidden layer for the forward and backward GRUs in the speech-level encoder. stFor the hidden layer representation of the decoder at time t, e (r)t-1) The word vector representation of the word is output for time t-1,
Figure BDA0003250570100000085
and (4) outputting a high-level semantic representation at an L layer for the t-1 moment graph coding layer.
Finally, according to high-level semantic information of image coding layer
Figure BDA0003250570100000086
Combined with a hidden layer representation s of the decoder at time ttTo predict the output at the current time. Represented by the following formula:
Figure BDA0003250570100000087
wherein Wo,boFor trainable parameters, p represents the probability that a word is generated at the current time.
The present invention replies with a given real dialog R ═ R1,r2,...,rT]For training the target, a cross entropy loss function is used
Figure BDA00032505701000000810
Training model parameters:
Figure BDA0003250570100000088
the invention is verified in two open source data sets Ubuntu and Dailydialog, and the realization result is shown in the following table:
Figure BDA0003250570100000089
Figure BDA0003250570100000091
as can be seen from the table, the evaluation indexes of the method of the invention on two data sets are basically superior to those of other baseline models, and the effectiveness of the attention network method based on the relational graph provided by the invention is verified. Wherein the method of the invention is significantly higher than all baseline models in the PPL, BLEU and BERTScore indexes, which shows that the reply information generated by the method is more relevant and diversified. Meanwhile, compared with the Ours-GCN method, the model provided by the invention is higher than the traditional GCN model in indexes, and the attention mechanism introduced into the graph network layer can effectively capture information between statement relation dependencies; compared with the Ours-NPE method, the method of the invention has greatly improved performance on indexes, and proves the importance of the relation position coding.
It is to be understood that the present invention has been described with reference to certain embodiments, and that various changes in the features and embodiments, or equivalent substitutions may be made therein by those skilled in the art without departing from the spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (6)

1. A multi-round dialog reply generation system based on a relational graph attention network comprises a sentence input coding layer, a graph coding layer and a decoding layer, and is characterized in that the sentence input coding layer comprises a word level coder and a speech level coder; the word-level encoder encodes words in each turn of the utterance input by the model, thereby obtaining a semantic representation of the turn of the utterance itself; the utterance level encoder encodes the utterance's own semantic representation of the model, thereby obtaining a semantic representation of the entire dialog context; the graph coding layer firstly captures autocorrelation of the words in multiple rounds of conversations and correlation characteristics among the interlocutors by adopting a graph attention network, and secondly introduces a relation position code to explain sequence information containing the words; the decoding layer generates a response reply based on the input contextual semantic information representation and the high level semantic representation of the graph encoding layer.
2. A method for multiple dialog reply rounds generation using the system of claim 1, comprising the steps of:
the method comprises the following steps: acquiring multi-turn dialog input contents for preprocessing, converting word meaning information in each turn of the utterances into corresponding vector representation through a pre-trained BERT model so as to obtain semantic information representation of each turn of the utterances, and encoding the word meaning information of each turn of the utterances through a Bi-GRU model so as to obtain semantic representation of a dialog context;
step two: capturing autocorrelation in multiple rounds of conversations and correlation characteristics among interlocutors by adopting a graph attention network, and introducing relationship position coding in the graph attention network to explain sequence information containing utterances, so as to obtain a high-level semantic representation of a graph coding layer;
step three: the dialog context semantic information representation and the high-level semantic representation of the relational graph attention coding are used as input, and a GRU model is used for decoding to generate a final dialog reply output representation.
3. The method for generating multi-turn dialog replies based on the attention network of the relational graph as claimed in claim 2, wherein in the step one, preprocessing the input content of the multi-turn dialog, and obtaining the semantic information representation process of each turn of the utterance comprises: the method comprises the steps of coding words in each input round of words, firstly, performing sequence labeling representation on each round of words by adopting a BPE algorithm, and then inputting the words into a pre-training language BERT model for fine-tuning learning training, so that semantic information representation of each round of words is obtained.
4. The relational graph attention network-based multi-turn dialog reply generation method according to claim 3, wherein step 1 is encoded using a word-level encoder and a speech-level encoder; the method comprises the following specific steps:
the word-level encoder sets { U } for a given length M of a multi-turn dialog context U1,...,uMFirstly, each round of utterances is subjected to sequence marking representation by adopting BPE algorithm
Figure FDA0003250570090000011
Wherein T isiMarking the number of the i-th turn of utterances, and inputting the number of the i-th turn of utterances into a pre-training language BERT model for fine-tuning learning training, wherein the encoding process of the word-level encoder is represented by the following formula:
Figure FDA0003250570090000021
thereby obtaining semantic information representation of each turn of speech
Figure FDA0003250570090000022
A Bi-GRU model is adopted as a coding mode of a speech level coder; the Bi-GRU model firstly obtains the self semantic representation of each turn of utterances from the upper layer word level encoder
Figure FDA0003250570090000023
As input, each round of utterances is encoded by a Bi-GRU model, the utterance-level encoder encoding process being represented by the following formula:
Figure FDA0003250570090000024
Figure FDA0003250570090000025
wherein
Figure FDA0003250570090000026
For the i-th hidden layer representation in the forward GRU,
Figure FDA0003250570090000027
is represented by the ith hidden layer in the backward GRU; the hidden layer representations of the forward GRU and the backward GRU are spliced to obtain semantic representation containing context information
Figure FDA0003250570090000028
High-level feature information between multiple turns of conversational utterances is captured through a hierarchical structure.
5. The method for generating dialog replies according to claim 4, wherein the obtaining of the high-level semantic representation of the graph coding layer in step two comprises the following steps:
constructing a directed graph for M sentences in a multi-turn conversation and defining the following steps:
Figure FDA0003250570090000029
will have multiple rounds of conversationEach statement in (c) is defined as a node vi
Figure FDA00032505700900000210
The relationship dependency information between each statement is defined as an edge r, (v)i,r,vj) E epsilon, wherein
Figure FDA00032505700900000211
The weight of an edge is defined as alphaijr
Figure FDA00032505700900000212
(1) Firstly, the context semantic output by the context coding layer is expressed
Figure FDA00032505700900000213
As node viAn initial vector representation of;
(2) constructing an information edge r based on the nodes, and carrying out the following 4 differentiation definitions on the type of the information edge r: (a) self-before type edge r1Target utterance and the information of the utterance relation dependency type before the utterer of the utterance; (b) inter-before type edge r2Target speech and previous speech relation dependency type information except the person who sends the speech; (c) self-after type edge r3Target utterance and utterance relation dependency type information behind the utterance speaker; (d) inter-after type edge r4Target utterance and subsequent utterance relationship dependency type information other than the utterance speaker;
(3) capturing timing information between the four types of information side utterances using a method of Relational Position encoding (Relational Position Encodings); the relational position coding process is expressed by the following formula:
Figure FDA00032505700900000214
wherein PEijrRepresenting a target utterance u under a relationship type riIts neighboring words ujThe maximum relation value is between [ b, a]B and a are sliding window values of the target speech and other speech,
Figure FDA00032505700900000215
representing a target utterance u under a relationship type riA neighborhood of (c);
(4) and calculating the weight of the related information edge, wherein the formula is as follows:
Figure FDA0003250570090000031
wherein alpha isijrRepresenting a target utterance u under a relationship type riIts neighboring words ujEdge weight value between, WrFor the parameterized weighting matrix in the attention mechanism, arFor parameterized weight vector, ·TRepresenting transposition, and LRL is LeakyReLU activation function;
(5) by aggregating neighborhoods
Figure FDA0003250570090000032
To update the vector representation of each node
Figure FDA0003250570090000033
The graph propagation mechanism encoding process is represented by the following formula:
Figure FDA0003250570090000034
Figure FDA0003250570090000035
wherein
Figure FDA0003250570090000036
For trainable parametric weighting matrices, L is the number of convolution layersFinally, the high-level semantic representation of the graph coding layer is output
Figure FDA0003250570090000037
6. The method for generating the dialog replies based on the attention network of the relationship graph as claimed in claim 5, wherein said step three comprises the following steps:
the GRU model is adopted as a decoder to generate the reply, and the decoding process of the decoder is represented by the following formula:
Figure FDA0003250570090000038
Figure FDA0003250570090000039
wherein s is0For the initialization input of the decoder, WeAnd beIn order to train the parameters, the user may,
Figure FDA00032505700900000310
a concatenated representation of the last hidden layer for forward and backward GRUs in the speech-level encoder; stFor the hidden layer representation of the decoder at time t, e (r)t-1) The word vector representation of the word is output for time t-1,
Figure FDA00032505700900000311
high-level semantic representation output at an L layer for a t-1 moment graph coding layer;
finally, according to high-level semantic information of image coding layer
Figure FDA00032505700900000312
Combined with a hidden layer representation s of the decoder at time ttThe output of the current time is predicted and is represented by the following formula:
Figure FDA00032505700900000313
wherein Wo,boFor trainable parameters, p represents the probability of generating a word at the current time;
replying to R ═ R with a given real conversation1,r2,...,rT]For training the target, a cross entropy loss function is used
Figure FDA00032505700900000314
Training model parameters:
Figure FDA0003250570090000041
CN202111044215.XA 2021-09-07 2021-09-07 Multi-round dialog reply generation system and method based on relational graph attention network Pending CN114281954A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111044215.XA CN114281954A (en) 2021-09-07 2021-09-07 Multi-round dialog reply generation system and method based on relational graph attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111044215.XA CN114281954A (en) 2021-09-07 2021-09-07 Multi-round dialog reply generation system and method based on relational graph attention network

Publications (1)

Publication Number Publication Date
CN114281954A true CN114281954A (en) 2022-04-05

Family

ID=80868514

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111044215.XA Pending CN114281954A (en) 2021-09-07 2021-09-07 Multi-round dialog reply generation system and method based on relational graph attention network

Country Status (1)

Country Link
CN (1) CN114281954A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292491A (en) * 2022-08-04 2022-11-04 四川大学 CTMSN-EHI-based task-type multi-round dialogue information processing method
CN116089593A (en) * 2023-03-24 2023-05-09 齐鲁工业大学(山东省科学院) Multi-pass man-machine dialogue method and device based on time sequence feature screening coding module

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292491A (en) * 2022-08-04 2022-11-04 四川大学 CTMSN-EHI-based task-type multi-round dialogue information processing method
CN116089593A (en) * 2023-03-24 2023-05-09 齐鲁工业大学(山东省科学院) Multi-pass man-machine dialogue method and device based on time sequence feature screening coding module
KR102610897B1 (en) * 2023-03-24 2023-12-07 치루 유니버시티 오브 테크놀로지 (산동 아카데미 오브 사이언시스) Method and device for multi-pass human-machine conversation based on time sequence feature screening and encoding module

Similar Documents

Publication Publication Date Title
CN108681610B (en) generating type multi-turn chatting dialogue method, system and computer readable storage medium
Oord et al. Parallel wavenet: Fast high-fidelity speech synthesis
CN110060690B (en) Many-to-many speaker conversion method based on STARGAN and ResNet
CN109785824B (en) Training method and device of voice translation model
CN111159368B (en) Reply generation method of personalized dialogue
CN108153913B (en) Training method of reply information generation model, reply information generation method and device
GB2572020A (en) A speech processing system and a method of processing a speech signal
CN111966800B (en) Emotion dialogue generation method and device and emotion dialogue model training method and device
CN110297887B (en) Service robot personalized dialogue system and method based on cloud platform
CN112115687B (en) Method for generating problem by combining triplet and entity type in knowledge base
Liu et al. Reinforcement learning for emotional text-to-speech synthesis with improved emotion discriminability
CN113158665A (en) Method for generating text abstract and generating bidirectional corpus-based improved dialog text
CN110060657B (en) SN-based many-to-many speaker conversion method
CN111368142B (en) Video intensive event description method based on generation countermeasure network
CN115964467A (en) Visual situation fused rich semantic dialogue generation method
CN114281954A (en) Multi-round dialog reply generation system and method based on relational graph attention network
KR20230127293A (en) Information synthesis method and device, electronic device and computer readable storage medium
Yadav et al. Speech prediction in silent videos using variational autoencoders
CN115662435B (en) Virtual teacher simulation voice generation method and terminal
CN114023300A (en) Chinese speech synthesis method based on diffusion probability model
CN111782788A (en) Automatic emotion reply generation method for open domain dialogue system
CN113360610A (en) Dialog generation method and system based on Transformer model
CN112364148A (en) Deep learning method-based generative chat robot
CN115269836A (en) Intention identification method and device
CN112100350B (en) Open domain dialogue method for intensifying reply personalized expression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination