CN111274375A - Multi-turn dialogue method and system based on bidirectional GRU network - Google Patents
Multi-turn dialogue method and system based on bidirectional GRU network Download PDFInfo
- Publication number
- CN111274375A CN111274375A CN202010067240.9A CN202010067240A CN111274375A CN 111274375 A CN111274375 A CN 111274375A CN 202010067240 A CN202010067240 A CN 202010067240A CN 111274375 A CN111274375 A CN 111274375A
- Authority
- CN
- China
- Prior art keywords
- answer
- vector
- context
- dialog
- inputting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002457 bidirectional effect Effects 0.000 title claims abstract description 42
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000012549 training Methods 0.000 claims abstract description 73
- 238000013135 deep learning Methods 0.000 claims abstract description 25
- 239000013598 vector Substances 0.000 claims description 130
- 238000012512 characterization method Methods 0.000 claims description 55
- 239000011159 matrix material Substances 0.000 claims description 46
- 238000004364 calculation method Methods 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 12
- 230000007246 mechanism Effects 0.000 claims description 12
- 230000009467 reduction Effects 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 241000288105 Grus Species 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 208000035126 Facies Diseases 0.000 claims 1
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 241000282320 Panthera leo Species 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3347—Query execution using vector based model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to a multi-turn dialogue method and a system based on a bidirectional GRU network, wherein the method comprises the following steps: step A: collecting dialogue context and answer data and constructing dialogue training setD(ii) a And B: using a dialog training setDTraining and fusing bidirectional GRU deep learning network modelM(ii) a And C: conversing with the user, inputting the user question into the trained deep learning network modelMAnd outputting the matched answer. The method and the system are beneficial to improving the matching of answers to the user questions.
Description
Technical Field
The invention relates to the field of natural language processing, in particular to a multi-turn dialogue method and a multi-turn dialogue system based on a bidirectional GRU network.
Background
In recent years, with the rapid development of deep learning and neural networks, the field of artificial intelligence is revolutionized. As one of core technologies in the field of artificial intelligence, multiple rounds of conversations have become a research hotspot, and in the future, the method can be widely applied to different industries such as human-computer interaction, intelligent home, intelligent customer service, intelligent family education, social robots and the like, and has great research significance, academic value and application value, so that the method obtains continuous attention of the academic world and high attention of the industrial world.
Lowe et al literally concatenates the dialog contexts to form a concatenated context matrix for matching with the answers, further taking into account the overall semantic meaning of the dialog context. Yan et al concatenates the context statements with the input message as a new query and performs matching of the deep neural network architecture. Zhou et al improves multi-angle response selection using a multi-view model that contains both a speech view and a word view. Zhou et al propose an attention-based dialog context and answer matching algorithm. The method constructs two matching matrixes by using a scale-attention-based self-attention mechanism and an interactive attention mechanism, and verifies the effectiveness of the method. Wu et al matches candidate answers to sentences for each context and uses RNN to maintain the ordering of sentence semantics, which improves system performance, indicating that interaction between answers and each context is valid. Zhou et al also interacted with each context sentence, they used one coding layer as a translation without using RNN to represent sentences at different levels. They use the attention mechanism to extract more dependency information between the dialog and the answer, and add all the information together to calculate the degree of match. The existing attention mechanism model can extract more dependency information between conversations and answers, but is easily affected by noise and cannot compensate long-term dependency.
Disclosure of Invention
The invention aims to provide a multi-turn dialogue method and a multi-turn dialogue system based on a bidirectional GRU network, which are beneficial to improving the matching of answers to user questions.
In order to achieve the purpose, the invention adopts the technical scheme that: a multi-turn dialogue method based on a bidirectional GRU network comprises the following steps:
step A: collecting conversation context and answer data, and constructing a conversation Training Set (TS);
and B: training a deep learning network model fusing a bidirectional GRU network by using a session Training Set (TS);
and C: and (5) carrying out dialogue with the user, inputting the user question into the trained deep learning network model, and outputting a matched answer.
Further, the step B specifically includes the following steps:
step B1: traversing a dialogue Training Set (TS), and coding the dialogue context and answer of each training sample to obtain an initial characterization vector;
step B2: inputting the initial characterization vectors of the conversation context and the answer into a multi-head attention mechanism module to obtain semantic characterization vectors of the conversation and the answer, and calculating a word similarity matrix of the conversation and the answer;
step B3: inputting the dialogue context and the initial characterization vector of the answer obtained in the step B1 into a bidirectional GRU network, calculating bidirectional hidden states of the dialogue and the answer, and then calculating a forward semantic characterization matrix and a reverse semantic characterization matrix of the dialogue and the answer;
step B4: merging a word similarity matrix, a forward semantic representation matrix and a reverse semantic representation matrix of the conversation and the answer into a tensor, inputting the tensor into a two-dimensional convolution neural network, and then performing feature dimensionality reduction to obtain a representation vector sequence fusing semantic information of the conversation and the answer;
step B5: inputting the characterization vector sequence obtained in the step B4 into a bidirectional GRU network to obtain a characterization vector fusing context dependence of conversation and answer and semantic information
Step B6: repeating the steps B2-B5, and calculating the context dependence relationship of the fused dialog and answer of all the training samples in the dialog training setSystem and characterization vector of semantic information
Step B7: feature vectors of all samplesInputting the data into a full-connection layer of a deep learning network model, calculating the gradient of each parameter in the deep network by using a back propagation method according to a target loss function loss, and updating the parameter by using a random gradient descent method;
step B8: and terminating the training of the deep learning network model when the loss value generated by the deep learning network model is smaller than a set threshold value or reaches the maximum iteration number.
Further, in the step B1, the dialog training set is represented asWherein N represents the number of training samples, and (U, a) represents a training sample consisting of a conversation context U and an answer a in a conversation training set TS, wherein the conversation context U consists of a plurality of sentences in a conversation process, and each sentence in the conversation context U and the answer a are coded respectively to obtain an initial characterization vector of the conversation context U; if utRepresents the t < th > sentence in the dialog context U, its initial token vectorExpressed as:
the initial characterization vector for answer a is represented as:
wherein,Ltand LaRespectively represents utAnd a the number of remaining words after word segmentation and removal of stop words,andare respectively asAndthe word vector of the ith word is obtained by pre-training the word vector matrixIs found in d1Represents the dimension of the word vector, | D | represents the number of words in the dictionary.
Further, the step B2 specifically includes the following steps:
step B21: selecting the ability to divide d1S, for each sentence in the conversational context, its initial token vectorInitial characterization vector with answerDividing the last dimension into s sub-vectors to obtain sub-vector sequencesAndwhereinIs utThe h-th sub-vector of (2),is thatThe h-th sub-vector of (1);
step B22: will be provided withEach subvector of (1) andwherein the corresponding sub-vectors form a sub-vector pair, i.e.h is 1,2, n, and is input into an attention mechanism module and calculatedSemantic representation vector ofAndsemantic representation vector of
wherein T represents a matrix transpose operation;
computingWeighted concatenation of to obtain utSemantic representation vector ofIs represented as follows:
wherein W1,W2Training parameters for a multi-head attention system;
step B23: calculating a word similarity matrix of each word and answer in the conversation context; u. oftThe word similarity matrix representing the tth sentence in the dialog context with answer aThe calculation formula of (a) is as follows:
further, the step B3 specifically includes the following steps:
step B31: taking the initial characterization vectors of the answers as a sequence formed by word vectors, inputting the sequence into a bidirectional GRU network, and calculating forward and reverse hidden state vectors;
initial characterization vector of answerIs regarded asThe formed sequence is sequentially input into forward GRU to obtain a forward hidden state vector of the answer
Will be provided withSequentially inputting reverse GRUs to obtain a reverse hidden state vector of the answerWhereind2The unit number of GRU;
step B32: regarding the initial characterization vector of each sentence in the dialog context as a sequence formed by word vectors, inputting the sequence into a bidirectional GRU network, and calculating forward and reverse hidden state vectors;
utthe t-th sentence in the context of the presentation dialog willIs regarded asThe formed sequence is input with forward GRU in turn to obtain the t-th sentence u in the dialog contexttForward hidden state vector of
Will be provided withInputting reverse GRU in turn to obtain the t-th sentence u in the dialog contexttReverse hidden state vector ofWherein
Step B33: calculating a forward semantic representation matrix and a reverse semantic representation matrix of each sentence in the dialogue context; u. oftDenotes the t-th sentence in the dialog context, and its forward semantic representation matrix M with answer a2,tAnd a reverse semantic representation matrix M3,tThe calculation formula of (a) is as follows:
further, the step B4 specifically includes the following steps:
Mt=[M1,t,M2,t,M3,t]
Step B42: will MtInputting the data into a two-dimensional convolution neural network for convolution and pooling, and then inputting the data into a full-connection layer for dimensionality reduction to obtain a fusion utCharacterization vector of semantic information of aWherein d is3Dimension after dimension reduction of the full connection layer;
step B43: for each sentence in the dialog context U, a characterization vector of its semantic information with the answer a is calculatedWherein L isuIs the number of sentences in the dialog context U.
Further, in the step B5, the token vector sequence is representedInputting the result into a bidirectional GRU network, modeling the relationship between the dialog context and the answer through the bidirectional GRU network, and taking the finally output hidden state vector as a context dependency relationship fusing the dialog and the answer and a characterization vector of semantic informationWherein
Further, the step B7 specifically includes the following steps:
step B71: the final characterization vectorInputting the data into the full-link layer, and calculating the probability of answers belonging to each category by using softmax normalization, wherein the calculation formula is as follows:
gc(U,a)=softmax(y)
wherein, WsIs a full connection layer weight matrix, bsBias term for fully connected layer, gc(U, a) is the probability of answering the dialog context U belonging to the training sample (U, a) processed in step B1, 0 ≦ gc(U,a)≤1, c belongs to { correct, wrong };
step B72: calculating a loss value by using the cross entropy as a loss function, updating the learning rate by using a gradient optimization algorithm AdaGrad, and training a model by using a minimum loss function by updating model parameters through back propagation iteration;
the calculation formula of the Loss minimization function Loss is as follows:
wherein (U)i,ai) Representing the i-th training sample, y, in a conversational training set, TSiAs a class label, yi∈{0,1}。
The invention also provides a multi-round dialogue system adopting the method, which comprises the following steps:
a training set building module for collecting the dialogue context and the answer data and building a dialogue training set TS;
the model training module is used for training a deep learning network model fusing the bidirectional GRU network by using a session Training Set (TS); and
and the multi-turn dialogue module is used for carrying out dialogue with the user, inputting the user questions into the trained deep learning network model and outputting the best matched answers.
Compared with the prior art, the invention has the following beneficial effects: a multi-turn dialogue method and system based on a bidirectional GRU network are provided, the method and system can capture long-term dependence by using multi-head attention, and the multi-head attention mechanism is finer in granularity than a traditional attention mechanism, so that the influence of noise can be reduced. Meanwhile, the bidirectional GRU can better capture the relation of the sentences in time, improve the accuracy and the matching of answers to questions asked by the user, and have strong practicability and wide application prospect.
Drawings
Fig. 1 is a flowchart of a method implementation of an embodiment of the invention.
Fig. 2 is a schematic structural diagram of a system according to an embodiment of the present invention.
FIG. 3 is a diagram of a model architecture according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the embodiments.
The invention provides a multi-turn dialogue method based on a bidirectional GRU network, which comprises the following steps as shown in figure 1:
step A: and collecting conversation context and answer data to construct a conversation Training Set (TS).
And B: and training a deep learning network model fusing the bidirectional GRU network by using a dialogue Training Set (TS).
FIG. 3 is an architecture diagram of a deep learning network model in an embodiment of the invention. Training the model using a dialog training set TS specifically comprises the steps of:
step B1: and traversing the dialog training set TS, and coding the dialog context and answer of each training sample to obtain an initial characterization vector.
Wherein the dialog training set is represented asWherein N represents the number of training samples, and (U, a) represents a training sample consisting of a conversation context U and an answer a in a conversation training set TS, wherein the conversation context U consists of a plurality of sentences in a conversation process, and each sentence in the conversation context U and the answer a are coded respectively to obtain an initial characterization vector of the conversation context U; if utRepresents the t < th > sentence in the dialog context U, its initial token vectorExpressed as:
the initial characterization vector for answer a is represented as:
wherein,Ltand LaRespectively represents utAnd a the number of remaining words after word segmentation and removal of stop words,andare respectively asAndthe word vector of the ith word is obtained by pre-training the word vector matrixIs found in d1Represents the dimension of the word vector, | D | represents the number of words in the dictionary.
Step B2: and inputting the initial characterization vectors of the conversation context and the answer into a multi-head attention mechanism module to obtain semantic characterization vectors of the conversation and the answer, and calculating a word similarity matrix of the conversation and the answer. The method specifically comprises the following steps:
step B21: selecting the ability to divide d1S, for each sentence in the conversational context, its initial token vectorInitial characterization vector with answerDividing the last dimension into s sub-vectors to obtain sub-vector sequencesAndwhereinIs utThe h-th sub-vector of (2),is thatThe h-th sub-vector of (1);
step B22: will be provided withEach subvector of (1) andwherein the corresponding sub-vectors form a sub-vector pair, i.e. Inputting the data into an attention mechanism module, and calculating to obtainSemantic representation vector ofAndsemantic representation vector of
wherein T represents a matrix transpose operation;
computingWeighted concatenation of to obtain utSemantic representation vector ofIs represented as follows:
wherein W1,W2Training parameters for a multi-head attention system;
step B23: calculating a word similarity matrix of each word and answer in the conversation context; u. oftThe word similarity matrix representing the tth sentence in the dialog context with answer aThe calculation formula of (a) is as follows:
step B3: inputting the dialogue context and the initial characterization vector of the answer obtained in the step B1 into the bidirectional GRU network, calculating bidirectional hidden states of the dialogue and the answer, and then calculating a forward semantic characterization matrix and a reverse semantic characterization matrix of the dialogue and the answer. The method specifically comprises the following steps:
step B31: taking the initial characterization vectors of the answers as a sequence formed by word vectors, inputting the sequence into a bidirectional GRU network, and calculating forward and reverse hidden state vectors;
initial characterization vector of answerIs regarded asThe formed sequence is sequentially input into forward GRU to obtain a forward hidden state vector of the answer
Will be provided withSequentially inputting reverse GRUs to obtain a reverse hidden state vector of the answerWhereind2The unit number of GRU;
step B32: regarding the initial characterization vector of each sentence in the dialog context as a sequence formed by word vectors, inputting the sequence into a bidirectional GRU network, and calculating forward and reverse hidden state vectors;
utthe t-th sentence in the context of the presentation dialog willIs regarded asThe formed sequence is input with forward GRU in turn to obtain the t-th sentence u in the dialog contexttForward hidden state vector of
Will be provided withInputting reverse GRU in turn to obtain the t-th sentence u in the dialog contexttReverse hidden state vector ofWherein
Step B33: calculating a forward semantic representation matrix and a reverse semantic representation matrix of each sentence in the dialogue context; u. oftDenotes the t-th sentence in the dialog context, and its forward semantic representation matrix M with answer a2,tAnd a reverse semantic representation matrix M3,tThe calculation formula of (a) is as follows:
step B4: combining the word similarity matrix, the forward semantic representation matrix and the reverse semantic representation matrix of the dialogue and the answer into a tensor, inputting the tensor into a two-dimensional convolution neural network, and then performing feature dimension reduction to obtain a representation vector sequence fusing semantic information of the dialogue and the answer. The method specifically comprises the following steps:
Mt=[M1,t,M2,t,M3,t]
Step B42: will MtInputting the data into a two-dimensional convolution neural network for convolution and pooling, and then inputting the data into a full-connection layer for dimensionality reduction to obtain a fusion utCharacterization vector of semantic information of aWherein d is3Dimension after dimension reduction of the full connection layer;
step B43: for each sentence in the dialog context U, a characterization vector of its semantic information with the answer a is calculatedWherein L isuIs the number of sentences in the dialog context U.
Step B5: inputting the characterization vector sequence obtained in the step B4 into a bidirectional GRU network to obtain a characterization vector fusing context dependence of conversation and answer and semantic information
Wherein the sequence of vectors is to be characterizedInputting the result into a bidirectional GRU network, modeling the relationship between the dialog context and the answer through the bidirectional GRU network, and taking the finally output hidden state vector as the context dependency relationship for fusing the dialog and the answer and the representation of semantic informationVector quantityWherein
Step B6: repeating the steps B2-B5, calculating the context dependency relationship of the fused dialog and answer of all the training samples in the dialog training set and the characterization vector of the semantic information
Step B7: feature vectors of all samplesInputting the data into a full-connection layer of a deep learning network model, calculating the gradient of each parameter in the deep network by using a back propagation method according to a target loss function loss, and updating the parameter by using a random gradient descent method. The method specifically comprises the following steps:
step B71: the final characterization vectorInputting the data into the full-link layer, and calculating the probability of answers belonging to each category by using softmax normalization, wherein the calculation formula is as follows:
gc(U,a)=softmax(y)
wherein, WsIs a full connection layer weight matrix, bsBias term for fully connected layer, gc(U, a) is the probability of answering the dialog context U belonging to the training sample (U, a) processed in step B1, 0 ≦ gc(U, a) is less than or equal to 1, c belongs to { correct, wrong };
step B72: calculating a loss value by using the cross entropy as a loss function, updating the learning rate by using a gradient optimization algorithm AdaGrad, and training a model by using a minimum loss function by updating model parameters through back propagation iteration;
the calculation formula of the Loss minimization function Loss is as follows:
wherein (U)i,ai) Representing the i-th training sample, y, in a conversational training set, TSiAs a class label, yi∈{0,1}。
Step B8: and terminating the training of the deep learning network model when the loss value generated by the deep learning network model is smaller than a set threshold value or reaches the maximum iteration number.
And C: and (5) carrying out dialogue with the user, inputting the user question into the trained deep learning network model, and outputting a matched answer.
The invention also provides a multi-turn dialog system adopting the method, as shown in fig. 2, comprising:
a training set building module for collecting the dialogue context and the answer data and building a dialogue training set TS;
the model training module is used for training a deep learning network model fusing the bidirectional GRU network by using a session Training Set (TS); and
and the multi-turn dialogue module is used for carrying out dialogue with the user, inputting the user questions into the trained deep learning network model and outputting the best matched answers.
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.
Claims (9)
1. A multi-turn dialogue method based on a bidirectional GRU network is characterized by comprising the following steps:
step A: collecting conversation context and answer data, and constructing a conversation Training Set (TS);
and B: training a deep learning network model fusing a bidirectional GRU network by using a session Training Set (TS);
and C: and (5) carrying out dialogue with the user, inputting the user question into the trained deep learning network model, and outputting a matched answer.
2. The method of claim 1, wherein step B specifically comprises the following steps:
step B1: traversing a dialogue Training Set (TS), and coding the dialogue context and answer of each training sample to obtain an initial characterization vector;
step B2: inputting the initial characterization vectors of the conversation context and the answer into a multi-head attention mechanism module to obtain semantic characterization vectors of the conversation and the answer, and calculating a word similarity matrix of the conversation and the answer;
step B3: inputting the dialogue context and the initial characterization vector of the answer obtained in the step B1 into a bidirectional GRU network, calculating bidirectional hidden states of the dialogue and the answer, and then calculating a forward semantic characterization matrix and a reverse semantic characterization matrix of the dialogue and the answer;
step B4: merging a word similarity matrix, a forward semantic representation matrix and a reverse semantic representation matrix of the conversation and the answer into a tensor, inputting the tensor into a two-dimensional convolution neural network, and then performing feature dimensionality reduction to obtain a representation vector sequence fusing semantic information of the conversation and the answer;
step B5: inputting the characterization vector sequence obtained in the step B4 into a bidirectional GRU network to obtain a characterization vector fusing context dependence of conversation and answer and semantic information
Step B6: repeating the steps B2-B5, calculating the context dependency relationship of the fused dialog and answer of all the training samples in the dialog training set and the characterization vector of the semantic information
Step (ii) ofB7: feature vectors of all samplesInputting the data into a full-connection layer of a deep learning network model, calculating the gradient of each parameter in the deep network by using a back propagation method according to a target loss function loss, and updating the parameter by using a random gradient descent method;
step B8: and terminating the training of the deep learning network model when the loss value generated by the deep learning network model is smaller than a set threshold value or reaches the maximum iteration number.
3. The method of claim 2, wherein in step B1, the dialog training set is expressed asWherein N represents the number of training samples, and (U, a) represents a training sample consisting of a conversation context U and an answer a in a conversation training set TS, wherein the conversation context U consists of a plurality of sentences in a conversation process, and each sentence in the conversation context U and the answer a are coded respectively to obtain an initial characterization vector of the conversation context U; if utRepresents the t < th > sentence in the dialog context U, its initial token vectorExpressed as:
the initial characterization vector for answer a is represented as:
wherein,Ltand LaRespectively represents utAnd a the number of remaining words after word segmentation and removal of stop words,andare respectively asAndthe word vector of the ith word is obtained by pre-training the word vector matrixIs found in d1Represents the dimension of the word vector, | D | represents the number of words in the dictionary.
4. The method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 3, wherein said step B2 specifically comprises the steps of:
step B21: selecting the ability to divide d1S, for each sentence in the conversational context, its initial token vectorInitial characterization vector with answerDividing the last dimension into s sub-vectors to obtain sub-vector sequencesAndwhereinIs utThe h-th sub-vector of (2),is thatThe h-th sub-vector of (1);
step B22: will be provided withEach subvector of (1) andwherein the corresponding sub-vectors form a sub-vector pair, i.e. Inputting the data into an attention mechanism module, and calculating to obtainSemantic representation vector ofAndsemantic representation vector of
wherein T represents a matrix transpose operation;
computingWeighted concatenation of to obtain utSemantic representation vector ofIs represented as follows:
wherein W1,W2Training parameters for a multi-head attention system;
step B23: calculating the term facies of each sentence and answer in a conversational contextA similarity matrix; u. oftThe word similarity matrix representing the tth sentence in the dialog context with answer aThe calculation formula of (a) is as follows:
5. the method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 4, wherein said step B3 specifically comprises the steps of:
step B31: taking the initial characterization vectors of the answers as a sequence formed by word vectors, inputting the sequence into a bidirectional GRU network, and calculating forward and reverse hidden state vectors;
initial characterization vector of answerIs regarded asThe formed sequence is sequentially input into forward GRU to obtain a forward hidden state vector of the answer
Will be provided withSequentially inputting reverse GRUs to obtain a reverse hidden state vector of the answerWhereind2The unit number of GRU;
step B32: regarding the initial characterization vector of each sentence in the dialog context as a sequence formed by word vectors, inputting the sequence into a bidirectional GRU network, and calculating forward and reverse hidden state vectors;
utthe t-th sentence in the context of the presentation dialog willIs regarded asThe formed sequence is input with forward GRU in turn to obtain the t-th sentence u in the dialog contexttForward hidden state vector of
Will be provided withInputting reverse GRU in turn to obtain the t-th sentence u in the dialog contexttReverse hidden state vector ofWherein
Step B33: calculating a forward semantic representation matrix and a reverse semantic representation matrix of each sentence in the dialogue context; u. oftDenotes the t-th sentence in the dialog context, and its forward semantic representation matrix M with answer a2,tAnd a reverse semantic representation matrix M3,tThe calculation formula of (a) is as follows:
6. the method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 5, wherein said step B4 specifically comprises the steps of:
Mt=[M1,t,M2,t,M3,t]
Step B42: will MtInputting the data into a two-dimensional convolution neural network for convolution and pooling, and then inputting the data into a full-connection layer for dimensionality reduction to obtain a fusion utCharacterization vector of semantic information of aWherein d is3Dimension after dimension reduction of the full connection layer;
7. The method of claim 6, wherein in step B5, the token vector sequence is represented by a two-way GRU networkInputting the result into a bidirectional GRU network, modeling the relationship between the conversation context and the answer through the bidirectional GRU network, and outputting the finally output hidden state vectorCharacterization vectors as context dependencies and semantic information for fusing dialog and answersWherein
8. The method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 7, wherein said step B7 specifically comprises the steps of:
step B71: the final characterization vectorInputting the data into the full-link layer, and calculating the probability of answers belonging to each category by using softmax normalization, wherein the calculation formula is as follows:
gc(U,a)=softmax(y)
wherein, WsIs a full connection layer weight matrix, bsBias term for fully connected layer, gc(U, a) is the probability of answering the dialog context U belonging to the training sample (U, a) processed in step B1, 0 ≦ gc(U, a) is less than or equal to 1, c belongs to { correct, wrong };
step B72: calculating a loss value by using the cross entropy as a loss function, updating the learning rate by using a gradient optimization algorithm AdaGrad, and training a model by using a minimum loss function by updating model parameters through back propagation iteration;
the calculation formula of the Loss minimization function Loss is as follows:
wherein (U)i,ai) Presentation dialogue trainingSet i training sample in TS, yiAs a class label, yi∈{0,1}。
9. A multi-turn dialog system employing the method of any of claims 1-8 comprising:
a training set building module for collecting the dialogue context and the answer data and building a dialogue training set TS;
the model training module is used for training a deep learning network model fusing the bidirectional GRU network by using a session Training Set (TS); and
and the multi-turn dialogue module is used for carrying out dialogue with the user, inputting the user questions into the trained deep learning network model and outputting the best matched answers.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010067240.9A CN111274375B (en) | 2020-01-20 | 2020-01-20 | Multi-turn dialogue method and system based on bidirectional GRU network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010067240.9A CN111274375B (en) | 2020-01-20 | 2020-01-20 | Multi-turn dialogue method and system based on bidirectional GRU network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111274375A true CN111274375A (en) | 2020-06-12 |
CN111274375B CN111274375B (en) | 2022-06-14 |
Family
ID=70996874
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010067240.9A Active CN111274375B (en) | 2020-01-20 | 2020-01-20 | Multi-turn dialogue method and system based on bidirectional GRU network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111274375B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112434143A (en) * | 2020-11-20 | 2021-03-02 | 西安交通大学 | Dialog method, storage medium and system based on hidden state constraint of GRU (generalized regression Unit) |
CN112632236A (en) * | 2020-12-02 | 2021-04-09 | 中山大学 | Improved sequence matching network-based multi-turn dialogue model |
CN112818105A (en) * | 2021-02-05 | 2021-05-18 | 江苏实达迪美数据处理有限公司 | Multi-turn dialogue method and system fusing context information |
CN113157855A (en) * | 2021-02-22 | 2021-07-23 | 福州大学 | Text summarization method and system fusing semantic and context information |
WO2021147405A1 (en) * | 2020-08-31 | 2021-07-29 | 平安科技(深圳)有限公司 | Customer-service statement quality detection method and related device |
CN114443827A (en) * | 2022-01-28 | 2022-05-06 | 福州大学 | Local information perception dialogue method and system based on pre-training language model |
CN114490991A (en) * | 2022-01-28 | 2022-05-13 | 福州大学 | Dialog structure perception dialog method and system based on fine-grained local information enhancement |
CN114564568A (en) * | 2022-02-25 | 2022-05-31 | 福州大学 | Knowledge enhancement and context awareness based dialog state tracking method and system |
CN115276697A (en) * | 2022-07-22 | 2022-11-01 | 交通运输部规划研究院 | Coast radio station communication system integrated with intelligent voice |
CN116128438A (en) * | 2022-12-27 | 2023-05-16 | 江苏巨楷科技发展有限公司 | Intelligent community management system based on big data record information |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108874972A (en) * | 2018-06-08 | 2018-11-23 | 青岛里奥机器人技术有限公司 | A kind of more wheel emotion dialogue methods based on deep learning |
CN109460463A (en) * | 2018-11-15 | 2019-03-12 | 平安科技(深圳)有限公司 | Model training method, device, terminal and storage medium based on data processing |
CN109933659A (en) * | 2019-03-22 | 2019-06-25 | 重庆邮电大学 | A kind of vehicle-mounted more wheel dialogue methods towards trip field |
CN110020015A (en) * | 2017-12-29 | 2019-07-16 | 中国科学院声学研究所 | A kind of conversational system answers generation method and system |
US20190385051A1 (en) * | 2018-06-14 | 2019-12-19 | Accenture Global Solutions Limited | Virtual agent with a dialogue management system and method of training a dialogue management system |
-
2020
- 2020-01-20 CN CN202010067240.9A patent/CN111274375B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110020015A (en) * | 2017-12-29 | 2019-07-16 | 中国科学院声学研究所 | A kind of conversational system answers generation method and system |
CN108874972A (en) * | 2018-06-08 | 2018-11-23 | 青岛里奥机器人技术有限公司 | A kind of more wheel emotion dialogue methods based on deep learning |
US20190385051A1 (en) * | 2018-06-14 | 2019-12-19 | Accenture Global Solutions Limited | Virtual agent with a dialogue management system and method of training a dialogue management system |
CN109460463A (en) * | 2018-11-15 | 2019-03-12 | 平安科技(深圳)有限公司 | Model training method, device, terminal and storage medium based on data processing |
CN109933659A (en) * | 2019-03-22 | 2019-06-25 | 重庆邮电大学 | A kind of vehicle-mounted more wheel dialogue methods towards trip field |
Non-Patent Citations (1)
Title |
---|
宋皓宇等: "基于DQN的开放域多轮对话策略学习", 《中文信息学报》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021147405A1 (en) * | 2020-08-31 | 2021-07-29 | 平安科技(深圳)有限公司 | Customer-service statement quality detection method and related device |
CN112434143A (en) * | 2020-11-20 | 2021-03-02 | 西安交通大学 | Dialog method, storage medium and system based on hidden state constraint of GRU (generalized regression Unit) |
CN112434143B (en) * | 2020-11-20 | 2022-12-09 | 西安交通大学 | Dialog method, storage medium and system based on hidden state constraint of GRU (generalized regression Unit) |
CN112632236A (en) * | 2020-12-02 | 2021-04-09 | 中山大学 | Improved sequence matching network-based multi-turn dialogue model |
CN112818105A (en) * | 2021-02-05 | 2021-05-18 | 江苏实达迪美数据处理有限公司 | Multi-turn dialogue method and system fusing context information |
CN112818105B (en) * | 2021-02-05 | 2021-12-07 | 江苏实达迪美数据处理有限公司 | Multi-turn dialogue method and system fusing context information |
CN113157855A (en) * | 2021-02-22 | 2021-07-23 | 福州大学 | Text summarization method and system fusing semantic and context information |
CN114443827A (en) * | 2022-01-28 | 2022-05-06 | 福州大学 | Local information perception dialogue method and system based on pre-training language model |
CN114490991A (en) * | 2022-01-28 | 2022-05-13 | 福州大学 | Dialog structure perception dialog method and system based on fine-grained local information enhancement |
CN114564568A (en) * | 2022-02-25 | 2022-05-31 | 福州大学 | Knowledge enhancement and context awareness based dialog state tracking method and system |
CN115276697A (en) * | 2022-07-22 | 2022-11-01 | 交通运输部规划研究院 | Coast radio station communication system integrated with intelligent voice |
CN116128438A (en) * | 2022-12-27 | 2023-05-16 | 江苏巨楷科技发展有限公司 | Intelligent community management system based on big data record information |
Also Published As
Publication number | Publication date |
---|---|
CN111274375B (en) | 2022-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111274375B (en) | Multi-turn dialogue method and system based on bidirectional GRU network | |
CN108681610B (en) | generating type multi-turn chatting dialogue method, system and computer readable storage medium | |
CN112667818B (en) | GCN and multi-granularity attention fused user comment sentiment analysis method and system | |
CN109614471B (en) | Open type problem automatic generation method based on generation type countermeasure network | |
WO2020140487A1 (en) | Speech recognition method for human-machine interaction of smart apparatus, and system | |
CN108363695B (en) | User comment attribute extraction method based on bidirectional dependency syntax tree representation | |
CN110489567B (en) | Node information acquisition method and device based on cross-network feature mapping | |
CN110222163A (en) | A kind of intelligent answer method and system merging CNN and two-way LSTM | |
CN113297364B (en) | Natural language understanding method and device in dialogue-oriented system | |
CN111274398A (en) | Method and system for analyzing comment emotion of aspect-level user product | |
CN112232087B (en) | Specific aspect emotion analysis method of multi-granularity attention model based on Transformer | |
CN112527966B (en) | Network text emotion analysis method based on Bi-GRU neural network and self-attention mechanism | |
CN114443827A (en) | Local information perception dialogue method and system based on pre-training language model | |
CN111966800A (en) | Emotional dialogue generation method and device and emotional dialogue model training method and device | |
CN115964467A (en) | Visual situation fused rich semantic dialogue generation method | |
CN114818703B (en) | Multi-intention recognition method and system based on BERT language model and TextCNN model | |
CN113239174A (en) | Hierarchical multi-round conversation generation method and device based on double-layer decoding | |
CN113807079A (en) | End-to-end entity and relation combined extraction method based on sequence-to-sequence | |
CN114328866A (en) | Strong anthropomorphic intelligent dialogue robot with smooth and accurate response | |
CN113239678B (en) | Multi-angle attention feature matching method and system for answer selection | |
CN112667788A (en) | Novel BERTEXT-based multi-round dialogue natural language understanding model | |
CN116595985A (en) | Method for assisting in enhancing emotion recognition in dialogue based on generated common sense | |
CN113705197B (en) | Fine granularity emotion analysis method based on position enhancement | |
CN116150334A (en) | Chinese co-emotion sentence training method and system based on UniLM model and Copy mechanism | |
CN115422945A (en) | Rumor detection method and system integrating emotion mining |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |