CN111274375A

CN111274375A - Multi-turn dialogue method and system based on bidirectional GRU network

Info

Publication number: CN111274375A
Application number: CN202010067240.9A
Authority: CN
Inventors: 陈羽中; 谢琪; 刘漳辉
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2020-06-12
Anticipated expiration: 2040-01-20
Also published as: CN111274375B

Abstract

The invention relates to a multi-turn dialogue method and a system based on a bidirectional GRU network, wherein the method comprises the following steps: step A: collecting dialogue context and answer data and constructing dialogue training setD(ii) a And B: using a dialog training setDTraining and fusing bidirectional GRU deep learning network modelM(ii) a And C: conversing with the user, inputting the user question into the trained deep learning network modelMAnd outputting the matched answer. The method and the system are beneficial to improving the matching of answers to the user questions.

Description

Multi-turn dialogue method and system based on bidirectional GRU network

Technical Field

The invention relates to the field of natural language processing, in particular to a multi-turn dialogue method and a multi-turn dialogue system based on a bidirectional GRU network.

Background

In recent years, with the rapid development of deep learning and neural networks, the field of artificial intelligence is revolutionized. As one of core technologies in the field of artificial intelligence, multiple rounds of conversations have become a research hotspot, and in the future, the method can be widely applied to different industries such as human-computer interaction, intelligent home, intelligent customer service, intelligent family education, social robots and the like, and has great research significance, academic value and application value, so that the method obtains continuous attention of the academic world and high attention of the industrial world.

Lowe et al literally concatenates the dialog contexts to form a concatenated context matrix for matching with the answers, further taking into account the overall semantic meaning of the dialog context. Yan et al concatenates the context statements with the input message as a new query and performs matching of the deep neural network architecture. Zhou et al improves multi-angle response selection using a multi-view model that contains both a speech view and a word view. Zhou et al propose an attention-based dialog context and answer matching algorithm. The method constructs two matching matrixes by using a scale-attention-based self-attention mechanism and an interactive attention mechanism, and verifies the effectiveness of the method. Wu et al matches candidate answers to sentences for each context and uses RNN to maintain the ordering of sentence semantics, which improves system performance, indicating that interaction between answers and each context is valid. Zhou et al also interacted with each context sentence, they used one coding layer as a translation without using RNN to represent sentences at different levels. They use the attention mechanism to extract more dependency information between the dialog and the answer, and add all the information together to calculate the degree of match. The existing attention mechanism model can extract more dependency information between conversations and answers, but is easily affected by noise and cannot compensate long-term dependency.

Disclosure of Invention

The invention aims to provide a multi-turn dialogue method and a multi-turn dialogue system based on a bidirectional GRU network, which are beneficial to improving the matching of answers to user questions.

In order to achieve the purpose, the invention adopts the technical scheme that: a multi-turn dialogue method based on a bidirectional GRU network comprises the following steps:

step A: collecting conversation context and answer data, and constructing a conversation Training Set (TS);

and B: training a deep learning network model fusing a bidirectional GRU network by using a session Training Set (TS);

and C: and (5) carrying out dialogue with the user, inputting the user question into the trained deep learning network model, and outputting a matched answer.

Further, the step B specifically includes the following steps:

step B1: traversing a dialogue Training Set (TS), and coding the dialogue context and answer of each training sample to obtain an initial characterization vector;

step B2: inputting the initial characterization vectors of the conversation context and the answer into a multi-head attention mechanism module to obtain semantic characterization vectors of the conversation and the answer, and calculating a word similarity matrix of the conversation and the answer;

step B3: inputting the dialogue context and the initial characterization vector of the answer obtained in the step B1 into a bidirectional GRU network, calculating bidirectional hidden states of the dialogue and the answer, and then calculating a forward semantic characterization matrix and a reverse semantic characterization matrix of the dialogue and the answer;

step B4: merging a word similarity matrix, a forward semantic representation matrix and a reverse semantic representation matrix of the conversation and the answer into a tensor, inputting the tensor into a two-dimensional convolution neural network, and then performing feature dimensionality reduction to obtain a representation vector sequence fusing semantic information of the conversation and the answer;

step B5: inputting the characterization vector sequence obtained in the step B4 into a bidirectional GRU network to obtain a characterization vector fusing context dependence of conversation and answer and semantic information

Step B6: repeating the steps B2-B5, and calculating the context dependence relationship of the fused dialog and answer of all the training samples in the dialog training setSystem and characterization vector of semantic information

Step B7: feature vectors of all samples

Inputting the data into a full-connection layer of a deep learning network model, calculating the gradient of each parameter in the deep network by using a back propagation method according to a target loss function loss, and updating the parameter by using a random gradient descent method;

step B8: and terminating the training of the deep learning network model when the loss value generated by the deep learning network model is smaller than a set threshold value or reaches the maximum iteration number.

Further, in the step B1, the dialog training set is represented as

Wherein N represents the number of training samples, and (U, a) represents a training sample consisting of a conversation context U and an answer a in a conversation training set TS, wherein the conversation context U consists of a plurality of sentences in a conversation process, and each sentence in the conversation context U and the answer a are coded respectively to obtain an initial characterization vector of the conversation context U; if u_tRepresents the t < th > sentence in the dialog context U, its initial token vector

Expressed as:

the initial characterization vector for answer a is represented as:

wherein,

L_tand L_aRespectively represents u_tAnd a the number of remaining words after word segmentation and removal of stop words,

and

are respectively as

And

the word vector of the ith word is obtained by pre-training the word vector matrix

Is found in d₁Represents the dimension of the word vector, | D | represents the number of words in the dictionary.

Further, the step B2 specifically includes the following steps:

step B21: selecting the ability to divide d₁S, for each sentence in the conversational context, its initial token vector

Initial characterization vector with answer

Dividing the last dimension into s sub-vectors to obtain sub-vector sequences

And

wherein

Is u_tThe h-th sub-vector of (2),

is that

The h-th sub-vector of (1);

step B22: will be provided with

Each subvector of (1) and

wherein the corresponding sub-vectors form a sub-vector pair, i.e.

h is 1,2, n, and is input into an attention mechanism module and calculated

Semantic representation vector of

And

semantic representation vector of

Wherein

The calculation formula of (a) is as follows:

the calculation formula of (a) is as follows:

wherein T represents a matrix transpose operation;

computing

Weighted concatenation of to obtain u_tSemantic representation vector of

Is represented as follows:

computing

To obtain a semantic representation vector of a

Is represented as follows:

wherein W₁,W₂Training parameters for a multi-head attention system;

step B23: calculating a word similarity matrix of each word and answer in the conversation context; u. of_tThe word similarity matrix representing the tth sentence in the dialog context with answer a

The calculation formula of (a) is as follows:

further, the step B3 specifically includes the following steps:

step B31: taking the initial characterization vectors of the answers as a sequence formed by word vectors, inputting the sequence into a bidirectional GRU network, and calculating forward and reverse hidden state vectors;

initial characterization vector of answer

Is regarded as

The formed sequence is sequentially input into forward GRU to obtain a forward hidden state vector of the answer

Will be provided with

Sequentially inputting reverse GRUs to obtain a reverse hidden state vector of the answer

Wherein

d₂The unit number of GRU;

step B32: regarding the initial characterization vector of each sentence in the dialog context as a sequence formed by word vectors, inputting the sequence into a bidirectional GRU network, and calculating forward and reverse hidden state vectors;

u_tthe t-th sentence in the context of the presentation dialog will

Is regarded as

The formed sequence is input with forward GRU in turn to obtain the t-th sentence u in the dialog context_tForward hidden state vector of

Will be provided with

Inputting reverse GRU in turn to obtain the t-th sentence u in the dialog context_tReverse hidden state vector of

Wherein

Step B33: calculating a forward semantic representation matrix and a reverse semantic representation matrix of each sentence in the dialogue context; u. of_tDenotes the t-th sentence in the dialog context, and its forward semantic representation matrix M with answer a_2,tAnd a reverse semantic representation matrix M_3,tThe calculation formula of (a) is as follows:

wherein,

further, the step B4 specifically includes the following steps:

step B41: merge M_1,t、M_2,t、M_3,tTo obtain tensor

M_t＝[M_1,t,M_2,t,M_3,t]

Step B42: will M_tInputting the data into a two-dimensional convolution neural network for convolution and pooling, and then inputting the data into a full-connection layer for dimensionality reduction to obtain a fusion u_tCharacterization vector of semantic information of a

Wherein d is₃Dimension after dimension reduction of the full connection layer;

step B43: for each sentence in the dialog context U, a characterization vector of its semantic information with the answer a is calculated

Wherein L is_uIs the number of sentences in the dialog context U.

Further, in the step B5, the token vector sequence is represented

Inputting the result into a bidirectional GRU network, modeling the relationship between the dialog context and the answer through the bidirectional GRU network, and taking the finally output hidden state vector as a context dependency relationship fusing the dialog and the answer and a characterization vector of semantic information

Wherein

Further, the step B7 specifically includes the following steps:

step B71: the final characterization vector

Inputting the data into the full-link layer, and calculating the probability of answers belonging to each category by using softmax normalization, wherein the calculation formula is as follows:

g^c(U,a)＝softmax(y)

wherein, W_sIs a full connection layer weight matrix, b_sBias term for fully connected layer, g^c(U, a) is the probability of answering the dialog context U belonging to the training sample (U, a) processed in step B1, 0 ≦ g^c(U,a)≤1, c belongs to { correct, wrong };

step B72: calculating a loss value by using the cross entropy as a loss function, updating the learning rate by using a gradient optimization algorithm AdaGrad, and training a model by using a minimum loss function by updating model parameters through back propagation iteration;

the calculation formula of the Loss minimization function Loss is as follows:

wherein (U)_i,a_i) Representing the i-th training sample, y, in a conversational training set, TS_iAs a class label, y_i∈{0,1}。

The invention also provides a multi-round dialogue system adopting the method, which comprises the following steps:

a training set building module for collecting the dialogue context and the answer data and building a dialogue training set TS;

the model training module is used for training a deep learning network model fusing the bidirectional GRU network by using a session Training Set (TS); and

and the multi-turn dialogue module is used for carrying out dialogue with the user, inputting the user questions into the trained deep learning network model and outputting the best matched answers.

Compared with the prior art, the invention has the following beneficial effects: a multi-turn dialogue method and system based on a bidirectional GRU network are provided, the method and system can capture long-term dependence by using multi-head attention, and the multi-head attention mechanism is finer in granularity than a traditional attention mechanism, so that the influence of noise can be reduced. Meanwhile, the bidirectional GRU can better capture the relation of the sentences in time, improve the accuracy and the matching of answers to questions asked by the user, and have strong practicability and wide application prospect.

Drawings

Fig. 1 is a flowchart of a method implementation of an embodiment of the invention.

Fig. 2 is a schematic structural diagram of a system according to an embodiment of the present invention.

FIG. 3 is a diagram of a model architecture according to an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and the embodiments.

The invention provides a multi-turn dialogue method based on a bidirectional GRU network, which comprises the following steps as shown in figure 1:

step A: and collecting conversation context and answer data to construct a conversation Training Set (TS).

And B: and training a deep learning network model fusing the bidirectional GRU network by using a dialogue Training Set (TS).

FIG. 3 is an architecture diagram of a deep learning network model in an embodiment of the invention. Training the model using a dialog training set TS specifically comprises the steps of:

step B1: and traversing the dialog training set TS, and coding the dialog context and answer of each training sample to obtain an initial characterization vector.

Wherein the dialog training set is represented as

Expressed as:

the initial characterization vector for answer a is represented as:

wherein,

and

are respectively as

And

Step B2: and inputting the initial characterization vectors of the conversation context and the answer into a multi-head attention mechanism module to obtain semantic characterization vectors of the conversation and the answer, and calculating a word similarity matrix of the conversation and the answer. The method specifically comprises the following steps:

Initial characterization vector with answer

Dividing the last dimension into s sub-vectors to obtain sub-vector sequences

And

wherein

Is u_tThe h-th sub-vector of (2),

is that

The h-th sub-vector of (1);

step B22: will be provided with

Each subvector of (1) and

wherein the corresponding sub-vectors form a sub-vector pair, i.e.

Inputting the data into an attention mechanism module, and calculating to obtain

Semantic representation vector of

And

semantic representation vector of

Wherein

The calculation formula of (a) is as follows:

the calculation formula of (a) is as follows:

wherein T represents a matrix transpose operation;

computing

Weighted concatenation of to obtain u_tSemantic representation vector of

Is represented as follows:

computing

To obtain a semantic representation vector of a

Is represented as follows:

wherein W₁,W₂Training parameters for a multi-head attention system;

The calculation formula of (a) is as follows:

step B3: inputting the dialogue context and the initial characterization vector of the answer obtained in the step B1 into the bidirectional GRU network, calculating bidirectional hidden states of the dialogue and the answer, and then calculating a forward semantic characterization matrix and a reverse semantic characterization matrix of the dialogue and the answer. The method specifically comprises the following steps:

initial characterization vector of answer

Is regarded as

Will be provided with

Wherein

d₂The unit number of GRU;

u_tthe t-th sentence in the context of the presentation dialog will

Is regarded as

Will be provided with

Wherein

wherein,

step B4: combining the word similarity matrix, the forward semantic representation matrix and the reverse semantic representation matrix of the dialogue and the answer into a tensor, inputting the tensor into a two-dimensional convolution neural network, and then performing feature dimension reduction to obtain a representation vector sequence fusing semantic information of the dialogue and the answer. The method specifically comprises the following steps:

step B41: merge M_1,t、M_2,t、M_3,tTo obtain tensor

M_t＝[M_1,t,M_2,t,M_3,t]

Wherein L is_uIs the number of sentences in the dialog context U.

Wherein the sequence of vectors is to be characterized

Inputting the result into a bidirectional GRU network, modeling the relationship between the dialog context and the answer through the bidirectional GRU network, and taking the finally output hidden state vector as the context dependency relationship for fusing the dialog and the answer and the representation of semantic informationVector quantity

Wherein

Step B6: repeating the steps B2-B5, calculating the context dependency relationship of the fused dialog and answer of all the training samples in the dialog training set and the characterization vector of the semantic information

Step B7: feature vectors of all samples

Inputting the data into a full-connection layer of a deep learning network model, calculating the gradient of each parameter in the deep network by using a back propagation method according to a target loss function loss, and updating the parameter by using a random gradient descent method. The method specifically comprises the following steps:

step B71: the final characterization vector

g^c(U,a)＝softmax(y)

wherein, W_sIs a full connection layer weight matrix, b_sBias term for fully connected layer, g^c(U, a) is the probability of answering the dialog context U belonging to the training sample (U, a) processed in step B1, 0 ≦ g^c(U, a) is less than or equal to 1, c belongs to { correct, wrong };

the calculation formula of the Loss minimization function Loss is as follows:

The invention also provides a multi-turn dialog system adopting the method, as shown in fig. 2, comprising:

The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims

1. A multi-turn dialogue method based on a bidirectional GRU network is characterized by comprising the following steps:

2. The method of claim 1, wherein step B specifically comprises the following steps:

Step (ii) ofB7: feature vectors of all samples

3. The method of claim 2, wherein in step B1, the dialog training set is expressed as

Expressed as:

the initial characterization vector for answer a is represented as:

wherein,

and

are respectively as

And

4. The method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 3, wherein said step B2 specifically comprises the steps of:

Initial characterization vector with answer

Dividing the last dimension into s sub-vectors to obtain sub-vector sequences

And

wherein

Is u_tThe h-th sub-vector of (2),

is that

The h-th sub-vector of (1);

step B22: will be provided with

Each subvector of (1) and

wherein the corresponding sub-vectors form a sub-vector pair, i.e.

Semantic representation vector of

And

semantic representation vector of

Wherein

The calculation formula of (a) is as follows:

the calculation formula of (a) is as follows:

wherein T represents a matrix transpose operation;

computing

Weighted concatenation of to obtain u_tSemantic representation vector of

Is represented as follows:

computing

To obtain a semantic representation vector of a

Is represented as follows:

wherein W₁,W₂Training parameters for a multi-head attention system;

step B23: calculating the term facies of each sentence and answer in a conversational contextA similarity matrix; u. of_tThe word similarity matrix representing the tth sentence in the dialog context with answer a

The calculation formula of (a) is as follows:

5. the method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 4, wherein said step B3 specifically comprises the steps of:

initial characterization vector of answer

Is regarded as

Will be provided with

Wherein

d₂The unit number of GRU;

u_tthe t-th sentence in the context of the presentation dialog will

Is regarded as

Will be provided with

Wherein

wherein,

6. the method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 5, wherein said step B4 specifically comprises the steps of:

step B41: merge M_1,t、M_2,t、M_3,tTo obtain tensor

M_t＝[M_1,t,M_2,t,M_3,t]

Wherein L is_uIs the number of sentences in the dialog context U.

7. The method of claim 6, wherein in step B5, the token vector sequence is represented by a two-way GRU network

Inputting the result into a bidirectional GRU network, modeling the relationship between the conversation context and the answer through the bidirectional GRU network, and outputting the finally output hidden state vectorCharacterization vectors as context dependencies and semantic information for fusing dialog and answers

Wherein

8. The method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 7, wherein said step B7 specifically comprises the steps of:

step B71: the final characterization vector

g^c(U,a)＝softmax(y)

the calculation formula of the Loss minimization function Loss is as follows:

wherein (U)_i,a_i) Presentation dialogue trainingSet i training sample in TS, y_iAs a class label, y_i∈{0,1}。

9. A multi-turn dialog system employing the method of any of claims 1-8 comprising: