CN112632236A

CN112632236A - Improved sequence matching network-based multi-turn dialogue model

Info

Publication number: CN112632236A
Application number: CN202011392502.5A
Authority: CN
Inventors: 王慧; 戴宪华
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2020-12-02
Filing date: 2020-12-02
Publication date: 2021-04-09

Abstract

The multiple rounds of conversations involved in chat robots and intelligent customer service are a hot spot of current research. In a plurality of rounds of dialogs based on the retrieval method, a Sequential Matching (SMN) model is representative, the model uses a single-layer GRU network in a dialog reply matching part, but the capability of extracting deep features of the single-layer GRU network is limited, and the obtained coded information can contain some noise. And the dialogue matching part of the model uses a CNN convolutional neural network, and the CNN network mainly focuses on local information, so that the extraction capability of the whole semantic information of the natural language sequence is limited, and the information obtained after the information passes through the CNN network is incomplete. The invention relates to an optimal matching algorithm for a multi-turn dialogue model sequential matching network. The method comprises the following steps: (1) a single-layer GRU network is changed into a multi-layer deep network. (2) The aggregation operation of the feature matrices M1 and M2 is advanced. (3) The CNN convolutional network is replaced by a GRU network. (4) The improved SMN network accuracy is improved by about 2 percentage points.

Description

Improved sequence matching network-based multi-turn dialogue model

Technical Field

The invention relates to the field of natural language processing, in particular to an algorithm for selecting an optimal reply by using an answer selection model in a multi-turn dialogue system.

Background

In recent years, with the heat of artificial intelligence, chat robots and intelligent customer service are widely used, wherein how to obtain accurate answers is a hot point of research, and the chat robots and the intelligent customer service are a process of multiple rounds of conversations, which not only consider problem information, but also need to pay attention to the context (context) of the conversations, because the context can provide many useful information and has an important role in constructing a coherent conversation. A model that is representative of search-based methods comparisons in multiple rounds of a dialog is the Sequential Matching (SMN) model. The model comprises three parts: dialog reply Matching (utterance response Matching), match accumulation (matchaggregation), and match Prediction (matchprediction). The overall thought is as follows: and forming a response-answer pair by the candidate reply response and each sentence of the utterances in the context, matching in two dimensions of a word level and a sentence level, stacking two matching vectors formed after each pair of the utterances-responses is matched together, and inputting the two matching vectors into a convolutional neural network to obtain a new matching vector of each utterances-responses pair. The match scores are then computed by entering the generated match vectors for all response-ute pairs in a round of multi-turn dialog into the GRU network in uttanece's chronological order. But the model dialogue reply matching part uses a single-layer GRU network to code the sentence after word embedding. Due to the limited ability of the single-layer GRU network to extract deep-level features, the obtained encoded information may contain some noise (useless semantic information). And the dialogue matching part of the model uses a convolution neural network to extract deeper matching information for the matching matrix at the word and sentence level, but the convolution neural network mainly focuses on local information, has limited capability of extracting the whole semantic information of the time sequence of natural language, and can lose some information. This may result in incomplete matching information contained in the matching vectors generated after the CNN network.

Disclosure of Invention

The present invention is directed to solving at least one of the above problems.

Therefore, the invention aims to provide an improved sequential matching network-based multi-turn dialogue model, which changes a single-layer GRU network after word embedding into a Deep GRU network and changes an original CNN network into a GRU network, wherein the performance of the improved network in a Ubantu data set and a double data set is obviously improved compared with that of the original SMN model.

In order to achieve the purpose, the technical scheme of the invention is as follows:

an improved sequential matching network based multi-turn dialog comprising the steps of:

s1, segmenting all contexts U and candidate replies r in a multi-turn conversation, converting words into word vectors, inputting the word vectors into a word embedding part, and obtaining word vector representations U ═ U ═ of the conversation U and the replies r through the word embedding part₁,…,u_nu]And R ═ R₁,…,r_nu]，

A word vector representation of the ith word in u and r, respectively.

And S2, carrying out multi-dimensional matching, and respectively coding U and R by using different structures. U is first encoded using a Deep recurrent neural network (Deep GRU). Compared with a single-layer GRU network used by an SMN model, the Deep semantic information extraction capability of the Deep GRU network is stronger, and the output of the Deep GRU network can better represent U. And when encoding R, a Deep GRU-GATE (Deep GRU with GATE) network is used, the external structure of the Deep GRU-GATE network is the same as that of the Deep GRU, except that the recurrent neural network in the Deep GRU-GATE network uses the GRU network GRU-GATE (GRU with input GATE) added with an input GATE instead of the traditional GRU network used in the Deep GRU, and a matching matrix M in two dimensions of word (word) and sentence (sense) of each oral-response pair is calculated₁And M₂。

S3, carrying out aggregation operation on the matching matrixes M1 and M2 obtained through multi-dimensional matching, and extracting deeper matching information between the utterance and the response through a neural network. The neural network is changed into a GRU circular neural network instead of the convolutional neural network in the original model. GRU network encodes the aggregated matrix M and outputs a matrix H containing deeper matching information₁＝[h_1,1,…,h_1,nu]。

S4. matching matrix H₁Inputting into another GRU network to encode it to obtain output

S5, outputting H in the step 4_mThe transformation is performed and then a matching score is output after passing through a softmax layer.

Compared with the prior art, the invention has the beneficial effects that:

1) the method provided by the invention provides an improved sequence matching network-based multi-turn dialogue model by changing a single-layer GRU network into a Deep GRU network, changing a CNN network into a GRU network and aggregating multidimensional matching matrixes in advance.

2) The method provided by the invention improves the multi-turn dialogue model by using the improved sequential matching network-based multi-turn dialogue model, and the accuracy is obviously improved compared with the original sequential matching network model.

3) The method provided by the invention can be applied to an intelligent customer service system of an e-commerce platform, has great improvement and improvement in the aspects of accuracy, algorithm stability and the like, and can be better suitable for actual engineering work.

Drawings

FIG. 1 is a flow diagram of an improved sequential matching network based multi-turn dialogue model according to one embodiment of the present invention

FIG. 2 is a schematic diagram of the structure of a sequential matching network multi-turn dialogue model according to an embodiment of the present invention

FIG. 3 is a schematic diagram of the structure of an improved sequential matching network multi-turn dialogue model according to an embodiment of the present invention

FIG. 4 is a schematic diagram of a Deep GRU network structure according to an embodiment of the present invention

FIG. 5 is a schematic diagram of Deep GRU-GATE network structure according to an embodiment of the present invention

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

the invention is further illustrated below with reference to the figures and examples.

Examples

Fig. 2 is a diagram of a Sequential Matching Network (SMN) according to an embodiment of the present invention, where the SMN is a model for answer selection based on search, and it can be seen from the figure that the structure of the SMN mainly consists of three parts: dialog reply Matching (utterance response Matching), match accumulation (matchaccumulations), and match accumulation (matchprediction). The dialogue-reply Matching (Underance-Response Matching) part is first a Word Embedding part (Word Embedding), which has the main role of converting words into vector representations. The input of the method is all contexts u and candidate replies r in a round of multi-turn conversation, and two feature matrixes M are obtained through a single-layer GRU network after passing through a word embedding layer₁And M₂. And then, a matching matrix V containing deep-level matching information is obtained through the sum operation of the convolutional layer and the pooling layer, and finally a matching score is obtained through a softmax function.

FIG. 3 is a network after improvement for the deficiency of SMN network, and it can be seen that the network after improvement replaces the single-layer GRU network with Deep GRU and advances the aggregation at M₁And M₂Before entering the neural network, the feature M₁And M₂The input is to the GRU network instead of the original CNN network.

The method provided by the invention comprises the following specific steps:

a) all the contexts u and the candidate answers r in a round of multi-turn conversation are input into a word embedding part to obtain word vectors, and then word vector representations of all the words are combined together to form a word embedding matrix.

Wherein the specific method of the step a) is as follows:

context u and reply r are represented as

u＝{u₁,u₂,…,u_nu}

R＝{r₁,r₂,…,r_nu} (1)

The word Embedding part converts each word in the input sentence into word vector representations with equal length by referring to a word vector Matrix (Embedding Matrix), and then combines the word vector representations of all the words to obtain a word Embedding Matrix of the sentence.

b) Inputting the word embedding matrix U into a deep recurrent neural network (DeepGRU) to encode the word embedding matrix U to obtain M₁Inputting the word embedding matrix R into a deep cyclic neural network (DeepGRU-GATE) with attention mechanism to encode the word embedding matrix R into M₂. Network after improvement first pair M₁And M₂And performing aggregation operation, and then extracting deeper matching information through a neural network.

Wherein the specific method of the step b) is as follows:

the word embedding matrix U is input into the Deep GRU network, the output of the next GRU network is the input of the previous GRU network, the weighted sum of all the layer outputs in the Deep GRU network is used as the output of the whole Deep GRU network, and the formula is as follows

hi＝∑wj (2)

Wherein h is_ij denotes the hidden state of the j-th GRU network in the DeepGRU at the i-th time, wj is the weighting coefficient of the j-th GRU network normalized by softmax and shared among all the dialogues and replies, l is the number of DeepGRU layers, h is the hidden state of the j-th GRU network in the DeepGRU_iIs the output of the entire DeepGRU network at time i.

The attention vector in the DeepGRU-GATE network is derived from the coded output h ═ h of the context U₁,…,h_n]By linear transformation of, i.e.

Attention＝L[h₁,…,hn] (3)

Wherein h is_i＝[h₁,…,hnu]Is the output of the i-th dialog in the context after passing through the DeepGRU network, and L (-) represents a linear transformation function. Since the more advanced in context dialog (utethan) has less effect on the reply, only the sentence of dialog in context that is closest in time to the reply r, i.e., u, is considered herein in generating the attention vector_n. So in this context L [ h ]₁,…,h_n]＝h_nAttention vector attention ═ h_n。

The forward propagation formula of the GRU network is as follows:

r_t＝σ(w_xrx_t+w_hrh_t-1)

z_t＝σ(w_xzx_t+w_hzh_t-1)

h_t＝tanh(w_xhx_t+w_hh(r_t⊙ht-1))

ht＝(1-z_t)⊙h_t-1+z_t⊙h_t (4)

the aggregation operation being directed to the matrix M₁And M₂The stitching is performed in a first dimension. The specific process is formulated as follows:

Y＝σ(X_iW_i+b_i) (5)

is a matrix formed after polymerization.

c) Inputting the matrix M formed after aggregation into the GRU network after modification, and encoding the matrix M after aggregation by the GRU network to output a matrix H containing deeper matching information₁＝[h_1,1,…,h_1,nu]，h_1,iIndicating the hidden state of the GRU network at the i-th time.

d) Hiding the hidden state h of the GRU network at the last moment_1,nuAs input, input into another GRU network to encode it to obtain output

The role of this part is two: (1) it models the dependency and timing relationship between each utterance in the context, (2) it exploits each dialog u in the context₁,u₂,…,u_nuThe accumulation of matching information in the GRU network in hidden states at each instant is supervised in temporal order. And the reset gate and the update gate in the GRU network can control the flow of the matching information in the network, so that the useful part of the matching information flows from the current time to the next time, and the noise part is filtered.

e) Finally output H_m＝[h′₁,…,h′_n]The transformation is performed and then a matching score is output after passing through a softmax layer.

Wherein the specific method of the step e) is as follows:

for H_m＝[h′₁,…,h′_n]Defining a function g (u, r) with the formula:

g(u,r)＝softmax(WL[h′₁,…,h′_n]+b) (6)

wherein W and b are both parameters, L [ h'₁,…,h′_n]Is h'₁,…,h′_nThere are three calculation methods for the linear transformation of (1): (1) directly selecting the last hidden State h'_nL is L [ 'h'₁,…,h′_n]＝h′_n. (2) Making a linear combination of all hidden states, i.e. L [ h'₁,…,h′_n]＝∑w_ih′_i，

(3) H 'was aligned using attention mechanism'₁,…,h′_nMake a weighting, i.e. L [ h'₁,…,h′_n]＝Attention[h′₁,…,h′_n]。

Examples

The invention carries out accuracy comparison and analysis experiments on the improved model and the SMN model on the Ubantu data set and the double data set of the public data set, and the accuracy comparison and analysis experiments are as follows:

the Ubantu data set is an english data set and comprises three parts of a training set, a verification set and a test set, wherein the number of context-response pairs (context-response pairs) contained in each part is respectively as follows: 1 million, 50 ten thousand. Each context-reply pair in the training set contains one positive answer and one negative answer (the interfering answer), and each context-reply pair in the validation set and the test set contains one positive answer and nine negative answers. The double data set is an open-domain Chinese dialogue data set which also comprises a training set, a verification set and a test set.

At the time of the experiment, the deep learning framework used was tensorflow. The word vector matrix used by the improved SMN model and SMN model word embedding parts is obtained by training on the Ubantu dataset and the Douban dataset respectively using the word2vec method proposed by Mikolov et al, and the dimension of each word vector is 200. In the improved SMN model, the number of layers of Deep GRU networks used is set to 3, and the number of neurons inside all the GRU networks is set to 200. The number of the neurons of the first GRU network in the SMN model located in the multidimensional matching part is set to be 200, and the number of the neurons of the GRU network located in the last GRU network in the SMN model is set to be 50. The improved SMN model and all trainable parameters of the SMN model are updated by the Adam algorithm. When training the model, batch _ size is set to 40 and the maximum length of each sentence dialog is set to 50. The number of sentences contained in the context of each dialog is 10 sentences. Since there is only one correct answer to each question in the test set of Ubantu data, for the Ubantu data set, this section takes R2@1 and R10@1 as evaluation indexes, and the experimental results are as in table 1 below. In the double test set, more than one correct reply is provided for each context, so that MAP and MRR are used as evaluation indexes, and the experimental results are shown in the following table 2.

TABLE 1 Experimental results of the improved sequential matching network multi-turn dialogue model on the Ubantu dataset

Model (model)	R₂@1	R₁₀@1	R₁₀@2	R₁₀@5
					SMN model	0.926	0.726	0.835	0.847
Improved SMN model	0.938	0.745	0.859	0.862

TABLE 2 Experimental results of the improved sequential matching network multi-turn dialogue model on the double dataset

Model (model)	MAP	MRR	R₁₀@2	R₁₀@5
					SMN model	0.529	0.567	0.233	0.396
Improved SMN model	0.547	0.589	0.258	0.417

From table 1, it can be seen that the improved SMN model proposed herein is improved by 1.2% and 1.9% over the SMN model in both evaluation indexes R2@1 and R10@1, respectively, on the Ubantu dataset; from table 2, it can be seen that the SMN model improved on the Douban data set is improved by 1.8% and 2.2% respectively in the two evaluation indexes of MAP and MRR. The experimental results on the two test sets show that the improvement of the invention on the defects of the SMN model really has practical effect, and the effectiveness of the improved model of the invention is proved.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. An improved sequence matching network-based multi-turn dialogue model is characterized by comprising the following steps:

s1, inputting all the contexts u and the candidate answers r in one-round multi-round conversation into a word embedding part to obtain word vectors, and then combining the word vector representations of all the words together to form a word embedding matrix.

S2, inputting the word embedding matrix U into a Deep recurrent neural network (Deep GRU) to encode the word embedding matrix U to obtain M₁Inputting the word embedding matrix R into a Deep recurrent neural network (Deep GRU-GATE) with attention mechanism to encode the word embedding matrix R into M₂. Network after improvement first pair M₁And M₂And performing aggregation operation, and then extracting deeper matching information through a neural network.

The output of the Deep GRU network is formulated as:

s3, inputting the matrix M formed after aggregation into the GRU network after modification, and the GRU network encodes the matrix M after aggregation and outputs a matrix H containing deeper matching information₁＝[h_1，1，…，h_1，nu]，h_1，iIndicating the hidden state of the GRU network at the i-th time.

S4, the GRU network is finalizedHidden state of carving h_1，nuAs input, input into another GRU network to encode it to obtain output

S5, outputting H at last_m＝[h′₁，…，h′_n]The transformation is performed and then a matching score is output after passing through a softmax layer.