CN110866403A

CN110866403A - End-to-end conversation state tracking method and system based on convolution cycle entity network

Info

Publication number: CN110866403A
Application number: CN201810916744.6A
Authority: CN
Inventors: 颜永红; 何峻青; 赵学敏
Original assignee: Institute of Acoustics CAS
Current assignee: Institute of Acoustics CAS
Priority date: 2018-08-13
Filing date: 2018-08-13
Publication date: 2020-03-06
Anticipated expiration: 2038-08-13
Also published as: CN110866403B

Abstract

The invention provides an end-to-end conversation state tracking method and system based on a convolution cycle entity network, comprising the following steps: step 1) representing a dialog as a plurality of sentence matrix sets D ═ S₁,...S_t}，S_iI is more than or equal to 1 and less than or equal to t is the ith sentence matrix consisting of a plurality of word vectors; step 2) the matrix set D passes through a trainable convolutional neural network module, and sentence vectors with fixed length are obtained after maximal pooling; step 3) using dynamic memory to encode each sentence vector with fixed length, using the last hidden layer h of dynamic memory_tRepresenting the entire conversation; step 4) establishing a layer of slave h for each predefined semantic slot_tObtaining probability distribution of each semantic groove on each value by a fully-connected neural network of all possible values of the semantic groove;and 5) taking the value of the maximum probability as the prediction result of the semantic slot to obtain the current conversation state of the conversation. The invention can automatically learn the text representation related to the semantic slot, and improves the performance of dialog state tracking.

Description

End-to-end conversation state tracking method and system based on convolution cycle entity network

Technical Field

The invention relates to the field of session state tracking of a session system, in particular to an end-to-end session state tracking method and system based on a convolution cycle entity network.

Background

Dialog state tracking is an important component in task-based dialog systems, whose goal is to maintain and update the user's goals, i.e., the values of the various semantic slots in a particular task, from time to time. For example, in a pre-determined restaurant task, querying a restaurant requires three semantic slots: cuisine, price and restaurant location, then dialog state tracking updates the values of these three semantic slots based on user input at all times in a multi-turn dialog.

Given a user input text and dialog history text, how to represent and update dialog states is a long-standing area of research and has recently been combined with deep learning and neural networks to eliminate manual labor. The current Neural Network-based method mainly includes a Convolutional Neural Network (CNN), a cyclic Neural Network (RNN), a Long Short Term Memory unit (LSTM), a Memory Network (MemNN), and a Neural Belief Tracker (NBT). The first four methods do not improve the network structure aiming at the special task of state tracking, are directly used in the task, and lack pertinence. The last method, NBT, requires preprocessing according to the semantic slots and constructs classifiers for each semantic slot value, and is not applicable to semantic slots with a large number of possible values. In addition, the effect of the end-to-end methods on a standard data set commonly used in the industry, namely DSTC2, is still not ideal, and the highest performance only reaches 73.4% of accuracy.

Disclosure of Invention

The invention aims to solve the problems that the existing method does not improve the network structure aiming at the special task of state tracking, lacks pertinence, is not suitable for semantic slots with a large number of possible values and the end-to-end method still has unsatisfactory effect on the standard data set DSTC2 commonly used in the industry.

In order to achieve the above object, the present invention provides an end-to-end conversation state tracking method based on a convolution cycle entity network, the method comprising:

step 1) representing a dialog as a plurality of sentence matrix sets D ═ S₁,...S_t}，S_iI is more than or equal to 1 and less than or equal to t is the ith sentence matrix consisting of a plurality of word vectors;

step 2) the matrix set D passes through a trainable convolutional neural network CNN module, and sentence vectors with fixed length are obtained after maximal pooling;

step 3) using dynamic memory to encode each sentence vector with fixed length, and using the last hidden layer h of dynamic memory_tRepresenting the entire conversation;

step 4) establishing a layer of slave h for each predefined semantic slot_tObtaining probability distribution of each semantic groove on each value by a fully-connected neural network of all possible values of the semantic groove;

and 5) taking the value of the maximum probability as the prediction result of the semantic slot to obtain the current conversation state of the conversation.

As a modification of the above method, the step 1) includes:

step 1-1) cutting dialogue data into t sentences according to each wheel dialogue, wherein the ith sentence, i is more than or equal to 1 and is less than or equal to t sentences contain a plurality of words, each word is represented by a word vector with fixed length, and the ith sentence, i is more than or equal to 1 and is less than or equal to t sentences are represented as a sentence matrix S_iFor each sentence matrix S_iThe sentence matrix S_iThe number of lines is the number of word vectors contained in the sentence, the sentence matrix S_iThe number of columns of (a) is the dimension of the word vector;

step 1-2) representing the dialogue data as a plurality of sentence matrix sets D ═ S₁,...S_t}。

As a modification of the above method, the step 2) includes:

step 2-1) for a convolution kernel W of height z_mUsing it as sliding step length with 1, in sentence matrix S_iSliding from top to bottom, calculating the dot product sum x of two matrixes of the overlapped part in each step_iObtaining a vector X with the length of N-z + 1:

x_i＝ReLU(W_m·S_i:i+z-1+b_m) (1)

X＝[x₁,x₂,...,x_N-z+1](2)

wherein, for dot product operation, S_i:i+z-1Represents the ith through i + z-1 th rows of the sentence matrix.]Representing element concatenation, ReLU regular linear operation, b_mN is the number of words contained in the sentence for the bias of the corresponding convolution kernel; i is the ith step of convolution kernel sliding; m is the serial number of the convolution kernel;

step 2-2) using maximum pooling for the vector X, taking the maximum value to obtain an element c_m；

c_m＝max(X) (3)

Step 2-3) performing convolution by using a plurality of different convolution kernels, wherein the height and the width of each convolution kernel are the length of the word vector, and the step 2-1) and the step 2-2) are performed for a plurality of times, so that the maximum value c of the vector X obtained by each convolution is obtained_mAnd (3) splicing to obtain a sentence vector s:

s＝[c₁,c₂,...,c_k](4)

where k is the total number of convolution kernels.

As a modification of the above method, the step 3) includes:

step 3-1) inputting sentence vectors s obtained from each sentence into dynamic memory;

step 3-2) inputting the t sentence vector s_tDynamic memory of a block j

The calculation formula is as follows:

wherein,

to update the gate, σ is the sigmoid function, w_jFor the trainable key vector of each block,

for the purpose of the updated candidate state,

for any non-linear activation function,

a hidden layer of the jth sentence, U, V and W represent trainable matrix parameters; t represents matrix transposition;

step 3-3) hidden layer vectors of all blocks at the moment

Splicing to obtain a hidden layer h at the moment^t：

Step 3-4) taking the dynamic memory hidden layer h of the last sentence_tIndicating the round of dialog.

As a modification of the above method, the step 4) includes:

step 4-1) for hidden layer h_tAll possible values of each semantic slot comprise two external choices of non and Dongcare, and a layer of neural network is established;

step 4-2) using Softmax to carry out normalization to obtain the probability y' of each possible value, wherein the formula is as follows:

y'＝Softmax(Rh_t) (10)

wherein R is a dynamic memory hidden layer h from the moment^tThe parameter matrix mapped to the semantic slot, y' is the probability estimate of all values on the semantic slot.

As a modification of the above method, the step 5) includes:

during training, the cross entropy of the true probability distribution y and the predicted probability distribution y' is used as a loss function loss, and the loss function is minimized to adjust all trainable parameters including convolution kernels in a convolution network; adjusting parameters by using a back propagation algorithm;

wherein M is the number of predefined semantic slots, i is the ith semantic slot, y'_iProbability estimation, y, representing all values on the ith semantic slot_iFor all values true probability distribution, V, on the ith semantic slot_iIndicates the number of values contained in the ith semantic slot, j indicates the jth value in the semantic slot,

respectively representing the probability corresponding to the jth element in the ith semantic slot in the real probability distribution and the estimated probability distribution;

during testing, all trainable parameters are loaded with corresponding values from the trained model; and for each semantic slot, taking the option corresponding to the maximum probability value as a prediction result to obtain a predicted dialogue state.

The invention also provides an end-to-end conversation state tracking system based on a convolution cycle entity network, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the method when executing the program.

The invention has the advantages that:

1. the method can automatically learn the text representation related to the semantic slot, and can obtain the text representation related to the semantic slot on the semantic representation through the convolutional neural network;

2. in the invention, on the aspect of state tracking, a circulating entity network with blocks is used for coding information related to semantic slots, thereby updating the state and realizing better effects than the commonly used RNN, LSTM and the like;

3. the invention uses less parameters in time complexity and space complexity, and is superior to the existing model;

4. the invention uses the convolution cycle entity network designed aiming at the dialogue tracking task, thereby improving the dialogue state tracking performance.

Drawings

FIG. 1 is a block diagram of the method of the present invention;

FIG. 2 is a schematic diagram of the structure of the method of the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments.

At present, a Recurrent neural Network (Recurrent Entity Network) is proposed, and answers to questions can be effectively tracked on a given story for a question-answering task; the performance is greatly superior to LSTM, MemNN. Therefore, the invention aims at the dialog state tracking task, improves the model and provides a Convolutional Recurrent Entity Network (CREN). The network system comprises 3 major parts: convolutional neural networks, dynamic memory, and semantic slot classifiers. The convolutional neural network is responsible for expressing each sentence by semantic slot correlation, dynamic memory is used for further coding and updating all sentence expressions of the whole dialogue, and a semantic slot classifier is used for carrying out probability estimation on a value of each semantic slot which is defined in advance. The model can automatically learn the text representation related to the semantic slot, and the dynamic memory used can encode the value of the semantic slot by using different blocks (Block), thereby updating the state.

The invention discloses an end-to-end dialogue state tracking method based on a convolution cycle entity network. For example, assume that we define a restaurant domain dialog system with two semantic slots: food, location. Inputting a one-way dialog: { "How can I hellpyou? "find a Chinese resource in the source part of top" }, after passing through the convolution cycling entity network, output { food: Chinese, location: source }.

The structure of the entire convolutional cyclic entity network is shown in fig. 1, with D ═ S for one dialogue₁,...S_tWhere t is the number of sentences, S_tIs a sentence. For each word of each sentence, the word vector is firstly expressed, and then the whole sentence matrix passes through a trainable CNN module, Max-posing, to obtain a vector with a fixed length. Then, each input sentence vector is coded by using an RNN variant, namely Dynamic Memory (Dynamic Memory), and the last hidden layer h of the Dynamic Memory is used_tRepresenting the entire conversation. Finally, for each semantic slot, a layer of slave h is established_tAnd obtaining the probability distribution of each value of each semantic slot by a fully connected neural network of all possible values of the semantic slot. And taking the value corresponding to the maximum probability as a prediction result of the semantic slot, so as to obtain the current conversation state of the conversation.

The dynamic memory is divided into different blocks, hidden layers are calculated respectively, and finally the hidden layers are spliced together.

As shown in FIG. 2, each block in the dynamic memory has a respective key vector w_i. For a certain block i, for the input sentence vector s_tFirst, the key and the hidden layer h of the previous time_t-1Are counted togetherCalculating the value g of the update gate_iAnd candidate hidden layer states

Then calculating the hidden layer state of the block, and splicing the hidden layer states of all the blocks to obtain a complete h_t. In the figure f_θRepresenting an update formula.

In the above technical solution, the method specifically includes:

step S1) cuts the dialogue data per wheel, expressed in the form from the dialogue start to the current dialogue statement set D and the corresponding dialogue state Slot. For each wheel set the sentence set D ═ { u ═ u }₁,u₂,…,u_iExpressing each word in each sentence by a word vector with fixed length, and then expressing a dialog as a plurality of sentence matrix sets D ═ S₁,S₂,…,S_iS, each sentence matrix_iIs the set maximum sentence length, which is the number of word vectors contained in the sentence, S_iIs the dimension of the word vector;

step S2) matrix S for each sentence_tThe process through a convolutional neural network is as follows: for a convolution kernel Wm with height z, which is used to slide from top to bottom in the whole matrix with 1 as sliding step, the dot product of two matrixes in the overlapped part and the activated value x are calculated in each step_i：

x_i＝ReLU(W_m·S_i:i+z-1+b_m) (1)

Finally, a vector X with the length of N-z +1 is obtained, wherein N is the number of words contained in the sentence:

X＝[x₁,x₂,...,x_N-z+1](2)

then using maximum pooling, taking the maximum value to obtain an element c_m：

c_m＝max(X) (3)

Performing convolution by using a plurality of convolution kernels with different heights, wherein the widths of the convolution kernels are all the length of a word vector, and each convolution is obtainedMaximum value c of vector X of_mAnd (3) splicing to obtain a sentence vector s:

s＝[c₁,c₂,...,c_k](4)

wherein, is a dot product operation.]Representing element concatenation, ReLU representing regular linear Unit (regularizer Unit), k being the total number of convolution kernels, m being the mth convolution, b_mAn offset for the corresponding convolution kernel;

step S3) the sentence vector S obtained from each sentence is input into the dynamic memory, and the dynamic memory hidden layer of the last sentence is taken as the representation of the dialog in the round.

Hidden layer for dynamically memorizing a certain block j for inputting t-th sentence vector

The calculation formula is as follows:

wherein,

to update the gate, σ is the sigmoid function, w_jA key vector for each tile (trainable);

for any non-linear activation function (here ReLU is used),

for the purpose of the updated candidate state,

then, the hidden vector quantity of all blocks at the moment is measured

Splicing to obtain a hidden layer h at the moment^t：

Step S4), for all possible values (including two external choices of None and Dontcare) of each semantic slot, a layer of neural network is established, and then the probability of each possible value is obtained by using Softmax for normalization, and the formula is as follows:

y'＝Softmax(Rh_t) (10)

wherein R is the hidden layer mapping h of dynamic memory from the moment^tThe parameter matrix, y', that hits the semantic slot is the probability estimate of all values on the semantic slot.

Step S5) during training, the cross entropy of the true probability distribution y and the predicted probability distribution y' is used as a loss function loss, minimizing the loss function to adjust all trainable parameters, including the convolution kernels in the convolutional network:

wherein M is the number of predefined semantic slots, i is the ith semantic slot, y'_iProbability estimation, y, representing all values on the ith semantic slot_iFor all values true probability distribution, V, on the ith semantic slot_iIndicating the content of the ith semantic slotThe number of values, j represents the jth value in the semantic slot,

and

respectively representing the probability corresponding to the jth element in the ith semantic slot in the true probability distribution and the estimated probability distribution.

During testing, all trainable parameters are loaded with corresponding values from the trained model. And for each semantic slot, taking the option corresponding to the maximum probability value as a prediction result to obtain a predicted dialogue state.

The answer generation method of the invention not only can effectively control the content of the generated answer, but also improves the quality of the answer.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. An end-to-end conversation state tracking method based on a convolution cycle entity network comprises the following steps:

2. The method for tracking the end-to-end conversation state based on the convolution cyclic entity network as claimed in claim 1, wherein the step 1) comprises:

3. The method for tracking the end-to-end conversation state based on the convolution cyclic entity network as claimed in claim 2, wherein the step 2) comprises:

x_i＝ReLU(W_m·S_i:i+z-1+b_m) (1)

X＝[x₁,x₂,...,x_N-z+1](2)

wherein, for dot product operation, S_i:i+z-1Represents the ith through i + z-1 th rows of the sentence matrix.]Representing element concatenation, ReLU regular linear operation, b_mFor offsets corresponding to convolution kernels, N is included in the sentenceThe number of words; i is the ith step of convolution kernel sliding; m is the serial number of the convolution kernel;

c_m＝ma₁x(X) (3)

s＝[c₁,c₂,...,c_k](4)

where k is the total number of convolution kernels.

4. The method for tracking the end-to-end conversation state based on the convolution cyclic entity network as claimed in claim 3, wherein the step 3) comprises:

step 3-2) inputting the t sentence vector s_tDynamic memory of a block j

The calculation formula is as follows:

wherein,

for the purpose of the updated candidate state,

for any non-linear activation function,

step 3-3) hidden layer vectors of all blocks at the moment

Splicing to obtain a hidden layer h at the moment^t：

5. The method for tracking end-to-end conversation state based on convolution cycle entity network as claimed in claim 4, wherein said step 4) includes:

y'＝Softmax(Rh_t) (10)

6. The method for tracking end-to-end conversation state based on convolution cycle entity network as claimed in claim 5, wherein said step 5) includes:

7. A system for tracking end-to-end conversation state based on a convolutional loop entity network, comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor executes the program to implement the steps of the method according to any one of claims 1 to 6.