CN112632236A - Improved sequence matching network-based multi-turn dialogue model - Google Patents

Improved sequence matching network-based multi-turn dialogue model Download PDF

Info

Publication number
CN112632236A
CN112632236A CN202011392502.5A CN202011392502A CN112632236A CN 112632236 A CN112632236 A CN 112632236A CN 202011392502 A CN202011392502 A CN 202011392502A CN 112632236 A CN112632236 A CN 112632236A
Authority
CN
China
Prior art keywords
network
gru
matching
matrix
deep
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011392502.5A
Other languages
Chinese (zh)
Inventor
王慧
戴宪华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202011392502.5A priority Critical patent/CN112632236A/en
Publication of CN112632236A publication Critical patent/CN112632236A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The multiple rounds of conversations involved in chat robots and intelligent customer service are a hot spot of current research. In a plurality of rounds of dialogs based on the retrieval method, a Sequential Matching (SMN) model is representative, the model uses a single-layer GRU network in a dialog reply matching part, but the capability of extracting deep features of the single-layer GRU network is limited, and the obtained coded information can contain some noise. And the dialogue matching part of the model uses a CNN convolutional neural network, and the CNN network mainly focuses on local information, so that the extraction capability of the whole semantic information of the natural language sequence is limited, and the information obtained after the information passes through the CNN network is incomplete. The invention relates to an optimal matching algorithm for a multi-turn dialogue model sequential matching network. The method comprises the following steps: (1) a single-layer GRU network is changed into a multi-layer deep network. (2) The aggregation operation of the feature matrices M1 and M2 is advanced. (3) The CNN convolutional network is replaced by a GRU network. (4) The improved SMN network accuracy is improved by about 2 percentage points.

Description

Improved sequence matching network-based multi-turn dialogue model
Technical Field
The invention relates to the field of natural language processing, in particular to an algorithm for selecting an optimal reply by using an answer selection model in a multi-turn dialogue system.
Background
In recent years, with the heat of artificial intelligence, chat robots and intelligent customer service are widely used, wherein how to obtain accurate answers is a hot point of research, and the chat robots and the intelligent customer service are a process of multiple rounds of conversations, which not only consider problem information, but also need to pay attention to the context (context) of the conversations, because the context can provide many useful information and has an important role in constructing a coherent conversation. A model that is representative of search-based methods comparisons in multiple rounds of a dialog is the Sequential Matching (SMN) model. The model comprises three parts: dialog reply Matching (utterance response Matching), match accumulation (matchaggregation), and match Prediction (matchprediction). The overall thought is as follows: and forming a response-answer pair by the candidate reply response and each sentence of the utterances in the context, matching in two dimensions of a word level and a sentence level, stacking two matching vectors formed after each pair of the utterances-responses is matched together, and inputting the two matching vectors into a convolutional neural network to obtain a new matching vector of each utterances-responses pair. The match scores are then computed by entering the generated match vectors for all response-ute pairs in a round of multi-turn dialog into the GRU network in uttanece's chronological order. But the model dialogue reply matching part uses a single-layer GRU network to code the sentence after word embedding. Due to the limited ability of the single-layer GRU network to extract deep-level features, the obtained encoded information may contain some noise (useless semantic information). And the dialogue matching part of the model uses a convolution neural network to extract deeper matching information for the matching matrix at the word and sentence level, but the convolution neural network mainly focuses on local information, has limited capability of extracting the whole semantic information of the time sequence of natural language, and can lose some information. This may result in incomplete matching information contained in the matching vectors generated after the CNN network.
Disclosure of Invention
The present invention is directed to solving at least one of the above problems.
Therefore, the invention aims to provide an improved sequential matching network-based multi-turn dialogue model, which changes a single-layer GRU network after word embedding into a Deep GRU network and changes an original CNN network into a GRU network, wherein the performance of the improved network in a Ubantu data set and a double data set is obviously improved compared with that of the original SMN model.
In order to achieve the purpose, the technical scheme of the invention is as follows:
an improved sequential matching network based multi-turn dialog comprising the steps of:
s1, segmenting all contexts U and candidate replies r in a multi-turn conversation, converting words into word vectors, inputting the word vectors into a word embedding part, and obtaining word vector representations U ═ U ═ of the conversation U and the replies r through the word embedding part1,…,unu]And R ═ R1,…,rnu],
Figure BDA0002811387720000021
A word vector representation of the ith word in u and r, respectively.
And S2, carrying out multi-dimensional matching, and respectively coding U and R by using different structures. U is first encoded using a Deep recurrent neural network (Deep GRU). Compared with a single-layer GRU network used by an SMN model, the Deep semantic information extraction capability of the Deep GRU network is stronger, and the output of the Deep GRU network can better represent U. And when encoding R, a Deep GRU-GATE (Deep GRU with GATE) network is used, the external structure of the Deep GRU-GATE network is the same as that of the Deep GRU, except that the recurrent neural network in the Deep GRU-GATE network uses the GRU network GRU-GATE (GRU with input GATE) added with an input GATE instead of the traditional GRU network used in the Deep GRU, and a matching matrix M in two dimensions of word (word) and sentence (sense) of each oral-response pair is calculated1And M2
S3, carrying out aggregation operation on the matching matrixes M1 and M2 obtained through multi-dimensional matching, and extracting deeper matching information between the utterance and the response through a neural network. The neural network is changed into a GRU circular neural network instead of the convolutional neural network in the original model. GRU network encodes the aggregated matrix M and outputs a matrix H containing deeper matching information1=[h1,1,…,h1,nu]。
S4. matching matrix H1Inputting into another GRU network to encode it to obtain output
Figure BDA0002811387720000022
Figure BDA0002811387720000023
S5, outputting H in the step 4mThe transformation is performed and then a matching score is output after passing through a softmax layer.
Compared with the prior art, the invention has the beneficial effects that:
1) the method provided by the invention provides an improved sequence matching network-based multi-turn dialogue model by changing a single-layer GRU network into a Deep GRU network, changing a CNN network into a GRU network and aggregating multidimensional matching matrixes in advance.
2) The method provided by the invention improves the multi-turn dialogue model by using the improved sequential matching network-based multi-turn dialogue model, and the accuracy is obviously improved compared with the original sequential matching network model.
3) The method provided by the invention can be applied to an intelligent customer service system of an e-commerce platform, has great improvement and improvement in the aspects of accuracy, algorithm stability and the like, and can be better suitable for actual engineering work.
Drawings
FIG. 1 is a flow diagram of an improved sequential matching network based multi-turn dialogue model according to one embodiment of the present invention
FIG. 2 is a schematic diagram of the structure of a sequential matching network multi-turn dialogue model according to an embodiment of the present invention
FIG. 3 is a schematic diagram of the structure of an improved sequential matching network multi-turn dialogue model according to an embodiment of the present invention
FIG. 4 is a schematic diagram of a Deep GRU network structure according to an embodiment of the present invention
FIG. 5 is a schematic diagram of Deep GRU-GATE network structure according to an embodiment of the present invention
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
the invention is further illustrated below with reference to the figures and examples.
Examples
Fig. 2 is a diagram of a Sequential Matching Network (SMN) according to an embodiment of the present invention, where the SMN is a model for answer selection based on search, and it can be seen from the figure that the structure of the SMN mainly consists of three parts: dialog reply Matching (utterance response Matching), match accumulation (matchaccumulations), and match accumulation (matchprediction). The dialogue-reply Matching (Underance-Response Matching) part is first a Word Embedding part (Word Embedding), which has the main role of converting words into vector representations. The input of the method is all contexts u and candidate replies r in a round of multi-turn conversation, and two feature matrixes M are obtained through a single-layer GRU network after passing through a word embedding layer1And M2. And then, a matching matrix V containing deep-level matching information is obtained through the sum operation of the convolutional layer and the pooling layer, and finally a matching score is obtained through a softmax function.
FIG. 3 is a network after improvement for the deficiency of SMN network, and it can be seen that the network after improvement replaces the single-layer GRU network with Deep GRU and advances the aggregation at M1And M2Before entering the neural network, the feature M1And M2The input is to the GRU network instead of the original CNN network.
The method provided by the invention comprises the following specific steps:
a) all the contexts u and the candidate answers r in a round of multi-turn conversation are input into a word embedding part to obtain word vectors, and then word vector representations of all the words are combined together to form a word embedding matrix.
Wherein the specific method of the step a) is as follows:
context u and reply r are represented as
u={u1,u2,…,unu}
R={r1,r2,…,rnu} (1)
Figure BDA0002811387720000031
The word Embedding part converts each word in the input sentence into word vector representations with equal length by referring to a word vector Matrix (Embedding Matrix), and then combines the word vector representations of all the words to obtain a word Embedding Matrix of the sentence.
b) Inputting the word embedding matrix U into a deep recurrent neural network (DeepGRU) to encode the word embedding matrix U to obtain M1Inputting the word embedding matrix R into a deep cyclic neural network (DeepGRU-GATE) with attention mechanism to encode the word embedding matrix R into M2. Network after improvement first pair M1And M2And performing aggregation operation, and then extracting deeper matching information through a neural network.
Wherein the specific method of the step b) is as follows:
the word embedding matrix U is input into the Deep GRU network, the output of the next GRU network is the input of the previous GRU network, the weighted sum of all the layer outputs in the Deep GRU network is used as the output of the whole Deep GRU network, and the formula is as follows
hi=∑wj (2)
Wherein h isij denotes the hidden state of the j-th GRU network in the DeepGRU at the i-th time, wj is the weighting coefficient of the j-th GRU network normalized by softmax and shared among all the dialogues and replies, l is the number of DeepGRU layers, h is the hidden state of the j-th GRU network in the DeepGRUiIs the output of the entire DeepGRU network at time i.
The attention vector in the DeepGRU-GATE network is derived from the coded output h ═ h of the context U1,…,hn]By linear transformation of, i.e.
Attention=L[h1,…,hn] (3)
Wherein h isi=[h1,…,hnu]Is the output of the i-th dialog in the context after passing through the DeepGRU network, and L (-) represents a linear transformation function. Since the more advanced in context dialog (utethan) has less effect on the reply, only the sentence of dialog in context that is closest in time to the reply r, i.e., u, is considered herein in generating the attention vectorn. So in this context L [ h ]1,…,hn]=hnAttention vector attention ═ hn
The forward propagation formula of the GRU network is as follows:
rt=σ(wxrxt+whrht-1)
zt=σ(wxzxt+whzht-1)
ht=tanh(wxhxt+whh(rt⊙ht-1))
ht=(1-zt)⊙ht-1+zt⊙ht (4)
the aggregation operation being directed to the matrix M1And M2The stitching is performed in a first dimension. The specific process is formulated as follows:
Y=σ(XiWi+bi) (5)
Figure BDA0002811387720000041
is a matrix formed after polymerization.
c) Inputting the matrix M formed after aggregation into the GRU network after modification, and encoding the matrix M after aggregation by the GRU network to output a matrix H containing deeper matching information1=[h1,1,…,h1,nu],h1,iIndicating the hidden state of the GRU network at the i-th time.
d) Hiding the hidden state h of the GRU network at the last moment1,nuAs input, input into another GRU network to encode it to obtain output
Figure BDA0002811387720000051
The role of this part is two: (1) it models the dependency and timing relationship between each utterance in the context, (2) it exploits each dialog u in the context1,u2,…,unuThe accumulation of matching information in the GRU network in hidden states at each instant is supervised in temporal order. And the reset gate and the update gate in the GRU network can control the flow of the matching information in the network, so that the useful part of the matching information flows from the current time to the next time, and the noise part is filtered.
e) Finally output Hm=[h′1,…,h′n]The transformation is performed and then a matching score is output after passing through a softmax layer.
Wherein the specific method of the step e) is as follows:
for Hm=[h′1,…,h′n]Defining a function g (u, r) with the formula:
g(u,r)=softmax(WL[h′1,…,h′n]+b) (6)
wherein W and b are both parameters, L [ h'1,…,h′n]Is h'1,…,h′nThere are three calculation methods for the linear transformation of (1): (1) directly selecting the last hidden State h'nL is L [ 'h'1,…,h′n]=h′n. (2) Making a linear combination of all hidden states, i.e. L [ h'1,…,h′n]=∑wih′i
Figure BDA0002811387720000052
(3) H 'was aligned using attention mechanism'1,…,h′nMake a weighting, i.e. L [ h'1,…,h′n]=Attention[h′1,…,h′n]。
Examples
The invention carries out accuracy comparison and analysis experiments on the improved model and the SMN model on the Ubantu data set and the double data set of the public data set, and the accuracy comparison and analysis experiments are as follows:
the Ubantu data set is an english data set and comprises three parts of a training set, a verification set and a test set, wherein the number of context-response pairs (context-response pairs) contained in each part is respectively as follows: 1 million, 50 ten thousand. Each context-reply pair in the training set contains one positive answer and one negative answer (the interfering answer), and each context-reply pair in the validation set and the test set contains one positive answer and nine negative answers. The double data set is an open-domain Chinese dialogue data set which also comprises a training set, a verification set and a test set.
At the time of the experiment, the deep learning framework used was tensorflow. The word vector matrix used by the improved SMN model and SMN model word embedding parts is obtained by training on the Ubantu dataset and the Douban dataset respectively using the word2vec method proposed by Mikolov et al, and the dimension of each word vector is 200. In the improved SMN model, the number of layers of Deep GRU networks used is set to 3, and the number of neurons inside all the GRU networks is set to 200. The number of the neurons of the first GRU network in the SMN model located in the multidimensional matching part is set to be 200, and the number of the neurons of the GRU network located in the last GRU network in the SMN model is set to be 50. The improved SMN model and all trainable parameters of the SMN model are updated by the Adam algorithm. When training the model, batch _ size is set to 40 and the maximum length of each sentence dialog is set to 50. The number of sentences contained in the context of each dialog is 10 sentences. Since there is only one correct answer to each question in the test set of Ubantu data, for the Ubantu data set, this section takes R2@1 and R10@1 as evaluation indexes, and the experimental results are as in table 1 below. In the double test set, more than one correct reply is provided for each context, so that MAP and MRR are used as evaluation indexes, and the experimental results are shown in the following table 2.
TABLE 1 Experimental results of the improved sequential matching network multi-turn dialogue model on the Ubantu dataset
Model (model) R2@1 R10@1 R10@2 R10@5
SMN model 0.926 0.726 0.835 0.847
Improved SMN model 0.938 0.745 0.859 0.862
TABLE 2 Experimental results of the improved sequential matching network multi-turn dialogue model on the double dataset
Model (model) MAP MRR R10@2 R10@5
SMN model 0.529 0.567 0.233 0.396
Improved SMN model 0.547 0.589 0.258 0.417
From table 1, it can be seen that the improved SMN model proposed herein is improved by 1.2% and 1.9% over the SMN model in both evaluation indexes R2@1 and R10@1, respectively, on the Ubantu dataset; from table 2, it can be seen that the SMN model improved on the Douban data set is improved by 1.8% and 2.2% respectively in the two evaluation indexes of MAP and MRR. The experimental results on the two test sets show that the improvement of the invention on the defects of the SMN model really has practical effect, and the effectiveness of the improved model of the invention is proved.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (1)

1. An improved sequence matching network-based multi-turn dialogue model is characterized by comprising the following steps:
s1, inputting all the contexts u and the candidate answers r in one-round multi-round conversation into a word embedding part to obtain word vectors, and then combining the word vector representations of all the words together to form a word embedding matrix.
S2, inputting the word embedding matrix U into a Deep recurrent neural network (Deep GRU) to encode the word embedding matrix U to obtain M1Inputting the word embedding matrix R into a Deep recurrent neural network (Deep GRU-GATE) with attention mechanism to encode the word embedding matrix R into M2. Network after improvement first pair M1And M2And performing aggregation operation, and then extracting deeper matching information through a neural network.
The output of the Deep GRU network is formulated as:
Figure FDA0002811387710000011
the aggregation operation being directed to the matrix M1And M2The stitching is performed in a first dimension. The specific process is formulated as follows:
Figure FDA0002811387710000012
s3, inputting the matrix M formed after aggregation into the GRU network after modification, and the GRU network encodes the matrix M after aggregation and outputs a matrix H containing deeper matching information1=[h1,1,…,h1,nu],h1,iIndicating the hidden state of the GRU network at the i-th time.
S4, the GRU network is finalizedHidden state of carving h1,nuAs input, input into another GRU network to encode it to obtain output
Figure FDA0002811387710000013
S5, outputting H at lastm=[h′1,…,h′n]The transformation is performed and then a matching score is output after passing through a softmax layer.
CN202011392502.5A 2020-12-02 2020-12-02 Improved sequence matching network-based multi-turn dialogue model Pending CN112632236A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011392502.5A CN112632236A (en) 2020-12-02 2020-12-02 Improved sequence matching network-based multi-turn dialogue model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011392502.5A CN112632236A (en) 2020-12-02 2020-12-02 Improved sequence matching network-based multi-turn dialogue model

Publications (1)

Publication Number Publication Date
CN112632236A true CN112632236A (en) 2021-04-09

Family

ID=75308367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011392502.5A Pending CN112632236A (en) 2020-12-02 2020-12-02 Improved sequence matching network-based multi-turn dialogue model

Country Status (1)

Country Link
CN (1) CN112632236A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083729A (en) * 2019-04-26 2019-08-02 北京金山数字娱乐科技有限公司 A kind of method and system of picture search
CN110309287A (en) * 2019-07-08 2019-10-08 北京邮电大学 The retrieval type of modeling dialog round information chats dialogue scoring method
CN110457675A (en) * 2019-06-26 2019-11-15 平安科技(深圳)有限公司 Prediction model training method, device, storage medium and computer equipment
CN111274375A (en) * 2020-01-20 2020-06-12 福州大学 Multi-turn dialogue method and system based on bidirectional GRU network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083729A (en) * 2019-04-26 2019-08-02 北京金山数字娱乐科技有限公司 A kind of method and system of picture search
CN110457675A (en) * 2019-06-26 2019-11-15 平安科技(深圳)有限公司 Prediction model training method, device, storage medium and computer equipment
CN110309287A (en) * 2019-07-08 2019-10-08 北京邮电大学 The retrieval type of modeling dialog round information chats dialogue scoring method
CN111274375A (en) * 2020-01-20 2020-06-12 福州大学 Multi-turn dialogue method and system based on bidirectional GRU network

Similar Documents

Publication Publication Date Title
CN109543180B (en) Text emotion analysis method based on attention mechanism
CN108681610B (en) generating type multi-turn chatting dialogue method, system and computer readable storage medium
CN108875807B (en) Image description method based on multiple attention and multiple scales
CN109829299B (en) Unknown attack identification method based on depth self-encoder
CN110413785A (en) A kind of Automatic document classification method based on BERT and Fusion Features
CN110222163A (en) A kind of intelligent answer method and system merging CNN and two-way LSTM
CN111274375B (en) Multi-turn dialogue method and system based on bidirectional GRU network
CN112115687B (en) Method for generating problem by combining triplet and entity type in knowledge base
CN112232087B (en) Specific aspect emotion analysis method of multi-granularity attention model based on Transformer
CN113435211B (en) Text implicit emotion analysis method combined with external knowledge
CN112784532B (en) Multi-head attention memory system for short text sentiment classification
CN111125333B (en) Generation type knowledge question-answering method based on expression learning and multi-layer covering mechanism
CN115964467A (en) Visual situation fused rich semantic dialogue generation method
CN113297364A (en) Natural language understanding method and device for dialog system
CN113255366B (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN112527966A (en) Network text emotion analysis method based on Bi-GRU neural network and self-attention mechanism
CN111914553B (en) Financial information negative main body judging method based on machine learning
Kesavan et al. Deep learning based automatic image caption generation
CN115841119B (en) Emotion cause extraction method based on graph structure
CN110597968A (en) Reply selection method and device
CN114036298B (en) Node classification method based on graph convolution neural network and word vector
CN113887836B (en) Descriptive event prediction method integrating event environment information
CN112528168B (en) Social network text emotion analysis method based on deformable self-attention mechanism
CN113807079A (en) End-to-end entity and relation combined extraction method based on sequence-to-sequence
CN113468874B (en) Biomedical relation extraction method based on graph convolution self-coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination