CN111274375A - Multi-turn dialogue method and system based on bidirectional GRU network - Google Patents

Multi-turn dialogue method and system based on bidirectional GRU network Download PDF

Info

Publication number
CN111274375A
CN111274375A CN202010067240.9A CN202010067240A CN111274375A CN 111274375 A CN111274375 A CN 111274375A CN 202010067240 A CN202010067240 A CN 202010067240A CN 111274375 A CN111274375 A CN 111274375A
Authority
CN
China
Prior art keywords
answer
vector
context
dialog
inputting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010067240.9A
Other languages
Chinese (zh)
Other versions
CN111274375B (en
Inventor
陈羽中
谢琪
刘漳辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202010067240.9A priority Critical patent/CN111274375B/en
Publication of CN111274375A publication Critical patent/CN111274375A/en
Application granted granted Critical
Publication of CN111274375B publication Critical patent/CN111274375B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a multi-turn dialogue method and a system based on a bidirectional GRU network, wherein the method comprises the following steps: step A: collecting dialogue context and answer data and constructing dialogue training setD(ii) a And B: using a dialog training setDTraining and fusing bidirectional GRU deep learning network modelM(ii) a And C: conversing with the user, inputting the user question into the trained deep learning network modelMAnd outputting the matched answer. The method and the system are beneficial to improving the matching of answers to the user questions.

Description

Multi-turn dialogue method and system based on bidirectional GRU network
Technical Field
The invention relates to the field of natural language processing, in particular to a multi-turn dialogue method and a multi-turn dialogue system based on a bidirectional GRU network.
Background
In recent years, with the rapid development of deep learning and neural networks, the field of artificial intelligence is revolutionized. As one of core technologies in the field of artificial intelligence, multiple rounds of conversations have become a research hotspot, and in the future, the method can be widely applied to different industries such as human-computer interaction, intelligent home, intelligent customer service, intelligent family education, social robots and the like, and has great research significance, academic value and application value, so that the method obtains continuous attention of the academic world and high attention of the industrial world.
Lowe et al literally concatenates the dialog contexts to form a concatenated context matrix for matching with the answers, further taking into account the overall semantic meaning of the dialog context. Yan et al concatenates the context statements with the input message as a new query and performs matching of the deep neural network architecture. Zhou et al improves multi-angle response selection using a multi-view model that contains both a speech view and a word view. Zhou et al propose an attention-based dialog context and answer matching algorithm. The method constructs two matching matrixes by using a scale-attention-based self-attention mechanism and an interactive attention mechanism, and verifies the effectiveness of the method. Wu et al matches candidate answers to sentences for each context and uses RNN to maintain the ordering of sentence semantics, which improves system performance, indicating that interaction between answers and each context is valid. Zhou et al also interacted with each context sentence, they used one coding layer as a translation without using RNN to represent sentences at different levels. They use the attention mechanism to extract more dependency information between the dialog and the answer, and add all the information together to calculate the degree of match. The existing attention mechanism model can extract more dependency information between conversations and answers, but is easily affected by noise and cannot compensate long-term dependency.
Disclosure of Invention
The invention aims to provide a multi-turn dialogue method and a multi-turn dialogue system based on a bidirectional GRU network, which are beneficial to improving the matching of answers to user questions.
In order to achieve the purpose, the invention adopts the technical scheme that: a multi-turn dialogue method based on a bidirectional GRU network comprises the following steps:
step A: collecting conversation context and answer data, and constructing a conversation Training Set (TS);
and B: training a deep learning network model fusing a bidirectional GRU network by using a session Training Set (TS);
and C: and (5) carrying out dialogue with the user, inputting the user question into the trained deep learning network model, and outputting a matched answer.
Further, the step B specifically includes the following steps:
step B1: traversing a dialogue Training Set (TS), and coding the dialogue context and answer of each training sample to obtain an initial characterization vector;
step B2: inputting the initial characterization vectors of the conversation context and the answer into a multi-head attention mechanism module to obtain semantic characterization vectors of the conversation and the answer, and calculating a word similarity matrix of the conversation and the answer;
step B3: inputting the dialogue context and the initial characterization vector of the answer obtained in the step B1 into a bidirectional GRU network, calculating bidirectional hidden states of the dialogue and the answer, and then calculating a forward semantic characterization matrix and a reverse semantic characterization matrix of the dialogue and the answer;
step B4: merging a word similarity matrix, a forward semantic representation matrix and a reverse semantic representation matrix of the conversation and the answer into a tensor, inputting the tensor into a two-dimensional convolution neural network, and then performing feature dimensionality reduction to obtain a representation vector sequence fusing semantic information of the conversation and the answer;
step B5: inputting the characterization vector sequence obtained in the step B4 into a bidirectional GRU network to obtain a characterization vector fusing context dependence of conversation and answer and semantic information
Figure BDA0002376341150000021
Step B6: repeating the steps B2-B5, and calculating the context dependence relationship of the fused dialog and answer of all the training samples in the dialog training setSystem and characterization vector of semantic information
Figure BDA0002376341150000022
Step B7: feature vectors of all samples
Figure BDA0002376341150000023
Inputting the data into a full-connection layer of a deep learning network model, calculating the gradient of each parameter in the deep network by using a back propagation method according to a target loss function loss, and updating the parameter by using a random gradient descent method;
step B8: and terminating the training of the deep learning network model when the loss value generated by the deep learning network model is smaller than a set threshold value or reaches the maximum iteration number.
Further, in the step B1, the dialog training set is represented as
Figure BDA0002376341150000024
Wherein N represents the number of training samples, and (U, a) represents a training sample consisting of a conversation context U and an answer a in a conversation training set TS, wherein the conversation context U consists of a plurality of sentences in a conversation process, and each sentence in the conversation context U and the answer a are coded respectively to obtain an initial characterization vector of the conversation context U; if utRepresents the t < th > sentence in the dialog context U, its initial token vector
Figure BDA0002376341150000025
Expressed as:
Figure BDA0002376341150000031
the initial characterization vector for answer a is represented as:
Figure BDA0002376341150000032
wherein,
Figure BDA0002376341150000033
Ltand LaRespectively represents utAnd a the number of remaining words after word segmentation and removal of stop words,
Figure BDA0002376341150000034
and
Figure BDA0002376341150000035
are respectively as
Figure BDA0002376341150000036
And
Figure BDA0002376341150000037
the word vector of the ith word is obtained by pre-training the word vector matrix
Figure BDA0002376341150000038
Is found in d1Represents the dimension of the word vector, | D | represents the number of words in the dictionary.
Further, the step B2 specifically includes the following steps:
step B21: selecting the ability to divide d1S, for each sentence in the conversational context, its initial token vector
Figure BDA0002376341150000039
Initial characterization vector with answer
Figure BDA00023763411500000310
Dividing the last dimension into s sub-vectors to obtain sub-vector sequences
Figure BDA00023763411500000311
And
Figure BDA00023763411500000312
wherein
Figure BDA00023763411500000330
Is utThe h-th sub-vector of (2),
Figure BDA00023763411500000331
is that
Figure BDA00023763411500000315
The h-th sub-vector of (1);
step B22: will be provided with
Figure BDA00023763411500000316
Each subvector of (1) and
Figure BDA00023763411500000317
wherein the corresponding sub-vectors form a sub-vector pair, i.e.
Figure BDA00023763411500000318
h is 1,2, n, and is input into an attention mechanism module and calculated
Figure BDA00023763411500000319
Semantic representation vector of
Figure BDA00023763411500000320
And
Figure BDA00023763411500000321
semantic representation vector of
Figure BDA00023763411500000322
Wherein
Figure BDA00023763411500000323
The calculation formula of (a) is as follows:
Figure BDA00023763411500000324
Figure BDA00023763411500000325
the calculation formula of (a) is as follows:
Figure BDA00023763411500000326
wherein T represents a matrix transpose operation;
computing
Figure BDA00023763411500000327
Weighted concatenation of to obtain utSemantic representation vector of
Figure BDA00023763411500000328
Is represented as follows:
Figure BDA00023763411500000329
computing
Figure BDA0002376341150000041
To obtain a semantic representation vector of a
Figure BDA0002376341150000042
Is represented as follows:
Figure BDA0002376341150000043
wherein W1,W2Training parameters for a multi-head attention system;
step B23: calculating a word similarity matrix of each word and answer in the conversation context; u. oftThe word similarity matrix representing the tth sentence in the dialog context with answer a
Figure BDA0002376341150000044
The calculation formula of (a) is as follows:
Figure BDA0002376341150000045
further, the step B3 specifically includes the following steps:
step B31: taking the initial characterization vectors of the answers as a sequence formed by word vectors, inputting the sequence into a bidirectional GRU network, and calculating forward and reverse hidden state vectors;
initial characterization vector of answer
Figure BDA0002376341150000046
Is regarded as
Figure BDA0002376341150000047
The formed sequence is sequentially input into forward GRU to obtain a forward hidden state vector of the answer
Figure BDA0002376341150000048
Will be provided with
Figure BDA0002376341150000049
Sequentially inputting reverse GRUs to obtain a reverse hidden state vector of the answer
Figure BDA00023763411500000410
Wherein
Figure BDA00023763411500000411
d2The unit number of GRU;
step B32: regarding the initial characterization vector of each sentence in the dialog context as a sequence formed by word vectors, inputting the sequence into a bidirectional GRU network, and calculating forward and reverse hidden state vectors;
utthe t-th sentence in the context of the presentation dialog will
Figure BDA00023763411500000412
Is regarded as
Figure BDA00023763411500000413
The formed sequence is input with forward GRU in turn to obtain the t-th sentence u in the dialog contexttForward hidden state vector of
Figure BDA00023763411500000414
Will be provided with
Figure BDA00023763411500000415
Inputting reverse GRU in turn to obtain the t-th sentence u in the dialog contexttReverse hidden state vector of
Figure BDA00023763411500000416
Wherein
Figure BDA00023763411500000417
Step B33: calculating a forward semantic representation matrix and a reverse semantic representation matrix of each sentence in the dialogue context; u. oftDenotes the t-th sentence in the dialog context, and its forward semantic representation matrix M with answer a2,tAnd a reverse semantic representation matrix M3,tThe calculation formula of (a) is as follows:
Figure BDA0002376341150000051
Figure BDA0002376341150000052
wherein,
Figure BDA0002376341150000053
further, the step B4 specifically includes the following steps:
step B41: merge M1,t、M2,t、M3,tTo obtain tensor
Figure BDA0002376341150000054
Mt=[M1,t,M2,t,M3,t]
Step B42: will MtInputting the data into a two-dimensional convolution neural network for convolution and pooling, and then inputting the data into a full-connection layer for dimensionality reduction to obtain a fusion utCharacterization vector of semantic information of a
Figure BDA0002376341150000055
Wherein d is3Dimension after dimension reduction of the full connection layer;
step B43: for each sentence in the dialog context U, a characterization vector of its semantic information with the answer a is calculated
Figure BDA0002376341150000056
Wherein L isuIs the number of sentences in the dialog context U.
Further, in the step B5, the token vector sequence is represented
Figure BDA0002376341150000057
Inputting the result into a bidirectional GRU network, modeling the relationship between the dialog context and the answer through the bidirectional GRU network, and taking the finally output hidden state vector as a context dependency relationship fusing the dialog and the answer and a characterization vector of semantic information
Figure BDA0002376341150000058
Wherein
Figure BDA0002376341150000059
Further, the step B7 specifically includes the following steps:
step B71: the final characterization vector
Figure BDA00023763411500000510
Inputting the data into the full-link layer, and calculating the probability of answers belonging to each category by using softmax normalization, wherein the calculation formula is as follows:
Figure BDA00023763411500000511
gc(U,a)=softmax(y)
wherein, WsIs a full connection layer weight matrix, bsBias term for fully connected layer, gc(U, a) is the probability of answering the dialog context U belonging to the training sample (U, a) processed in step B1, 0 ≦ gc(U,a)≤1, c belongs to { correct, wrong };
step B72: calculating a loss value by using the cross entropy as a loss function, updating the learning rate by using a gradient optimization algorithm AdaGrad, and training a model by using a minimum loss function by updating model parameters through back propagation iteration;
the calculation formula of the Loss minimization function Loss is as follows:
Figure BDA0002376341150000061
wherein (U)i,ai) Representing the i-th training sample, y, in a conversational training set, TSiAs a class label, yi∈{0,1}。
The invention also provides a multi-round dialogue system adopting the method, which comprises the following steps:
a training set building module for collecting the dialogue context and the answer data and building a dialogue training set TS;
the model training module is used for training a deep learning network model fusing the bidirectional GRU network by using a session Training Set (TS); and
and the multi-turn dialogue module is used for carrying out dialogue with the user, inputting the user questions into the trained deep learning network model and outputting the best matched answers.
Compared with the prior art, the invention has the following beneficial effects: a multi-turn dialogue method and system based on a bidirectional GRU network are provided, the method and system can capture long-term dependence by using multi-head attention, and the multi-head attention mechanism is finer in granularity than a traditional attention mechanism, so that the influence of noise can be reduced. Meanwhile, the bidirectional GRU can better capture the relation of the sentences in time, improve the accuracy and the matching of answers to questions asked by the user, and have strong practicability and wide application prospect.
Drawings
Fig. 1 is a flowchart of a method implementation of an embodiment of the invention.
Fig. 2 is a schematic structural diagram of a system according to an embodiment of the present invention.
FIG. 3 is a diagram of a model architecture according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the embodiments.
The invention provides a multi-turn dialogue method based on a bidirectional GRU network, which comprises the following steps as shown in figure 1:
step A: and collecting conversation context and answer data to construct a conversation Training Set (TS).
And B: and training a deep learning network model fusing the bidirectional GRU network by using a dialogue Training Set (TS).
FIG. 3 is an architecture diagram of a deep learning network model in an embodiment of the invention. Training the model using a dialog training set TS specifically comprises the steps of:
step B1: and traversing the dialog training set TS, and coding the dialog context and answer of each training sample to obtain an initial characterization vector.
Wherein the dialog training set is represented as
Figure BDA0002376341150000062
Wherein N represents the number of training samples, and (U, a) represents a training sample consisting of a conversation context U and an answer a in a conversation training set TS, wherein the conversation context U consists of a plurality of sentences in a conversation process, and each sentence in the conversation context U and the answer a are coded respectively to obtain an initial characterization vector of the conversation context U; if utRepresents the t < th > sentence in the dialog context U, its initial token vector
Figure BDA0002376341150000071
Expressed as:
Figure BDA0002376341150000072
the initial characterization vector for answer a is represented as:
Figure BDA0002376341150000073
wherein,
Figure BDA0002376341150000074
Ltand LaRespectively represents utAnd a the number of remaining words after word segmentation and removal of stop words,
Figure BDA0002376341150000075
and
Figure BDA0002376341150000076
are respectively as
Figure BDA0002376341150000077
And
Figure BDA0002376341150000078
the word vector of the ith word is obtained by pre-training the word vector matrix
Figure BDA0002376341150000079
Is found in d1Represents the dimension of the word vector, | D | represents the number of words in the dictionary.
Step B2: and inputting the initial characterization vectors of the conversation context and the answer into a multi-head attention mechanism module to obtain semantic characterization vectors of the conversation and the answer, and calculating a word similarity matrix of the conversation and the answer. The method specifically comprises the following steps:
step B21: selecting the ability to divide d1S, for each sentence in the conversational context, its initial token vector
Figure BDA00023763411500000710
Initial characterization vector with answer
Figure BDA00023763411500000711
Dividing the last dimension into s sub-vectors to obtain sub-vector sequences
Figure BDA00023763411500000712
And
Figure BDA00023763411500000713
wherein
Figure BDA00023763411500000714
Is utThe h-th sub-vector of (2),
Figure BDA00023763411500000715
is that
Figure BDA00023763411500000716
The h-th sub-vector of (1);
step B22: will be provided with
Figure BDA00023763411500000717
Each subvector of (1) and
Figure BDA00023763411500000718
wherein the corresponding sub-vectors form a sub-vector pair, i.e.
Figure BDA00023763411500000719
Figure BDA00023763411500000720
Inputting the data into an attention mechanism module, and calculating to obtain
Figure BDA00023763411500000721
Semantic representation vector of
Figure BDA00023763411500000722
And
Figure BDA00023763411500000723
semantic representation vector of
Figure BDA00023763411500000724
Wherein
Figure BDA00023763411500000725
The calculation formula of (a) is as follows:
Figure BDA00023763411500000726
Figure BDA00023763411500000727
the calculation formula of (a) is as follows:
Figure BDA0002376341150000081
wherein T represents a matrix transpose operation;
computing
Figure BDA0002376341150000082
Weighted concatenation of to obtain utSemantic representation vector of
Figure BDA0002376341150000083
Is represented as follows:
Figure BDA0002376341150000084
computing
Figure BDA0002376341150000085
To obtain a semantic representation vector of a
Figure BDA0002376341150000086
Is represented as follows:
Figure BDA0002376341150000087
wherein W1,W2Training parameters for a multi-head attention system;
step B23: calculating a word similarity matrix of each word and answer in the conversation context; u. oftThe word similarity matrix representing the tth sentence in the dialog context with answer a
Figure BDA0002376341150000088
The calculation formula of (a) is as follows:
Figure BDA0002376341150000089
step B3: inputting the dialogue context and the initial characterization vector of the answer obtained in the step B1 into the bidirectional GRU network, calculating bidirectional hidden states of the dialogue and the answer, and then calculating a forward semantic characterization matrix and a reverse semantic characterization matrix of the dialogue and the answer. The method specifically comprises the following steps:
step B31: taking the initial characterization vectors of the answers as a sequence formed by word vectors, inputting the sequence into a bidirectional GRU network, and calculating forward and reverse hidden state vectors;
initial characterization vector of answer
Figure BDA00023763411500000810
Is regarded as
Figure BDA00023763411500000811
The formed sequence is sequentially input into forward GRU to obtain a forward hidden state vector of the answer
Figure BDA00023763411500000812
Will be provided with
Figure BDA00023763411500000813
Sequentially inputting reverse GRUs to obtain a reverse hidden state vector of the answer
Figure BDA00023763411500000814
Wherein
Figure BDA00023763411500000815
d2The unit number of GRU;
step B32: regarding the initial characterization vector of each sentence in the dialog context as a sequence formed by word vectors, inputting the sequence into a bidirectional GRU network, and calculating forward and reverse hidden state vectors;
utthe t-th sentence in the context of the presentation dialog will
Figure BDA00023763411500000816
Is regarded as
Figure BDA00023763411500000817
The formed sequence is input with forward GRU in turn to obtain the t-th sentence u in the dialog contexttForward hidden state vector of
Figure BDA0002376341150000091
Will be provided with
Figure BDA0002376341150000092
Inputting reverse GRU in turn to obtain the t-th sentence u in the dialog contexttReverse hidden state vector of
Figure BDA0002376341150000093
Wherein
Figure BDA0002376341150000094
Step B33: calculating a forward semantic representation matrix and a reverse semantic representation matrix of each sentence in the dialogue context; u. oftDenotes the t-th sentence in the dialog context, and its forward semantic representation matrix M with answer a2,tAnd a reverse semantic representation matrix M3,tThe calculation formula of (a) is as follows:
Figure BDA0002376341150000095
Figure BDA0002376341150000096
wherein,
Figure BDA0002376341150000097
step B4: combining the word similarity matrix, the forward semantic representation matrix and the reverse semantic representation matrix of the dialogue and the answer into a tensor, inputting the tensor into a two-dimensional convolution neural network, and then performing feature dimension reduction to obtain a representation vector sequence fusing semantic information of the dialogue and the answer. The method specifically comprises the following steps:
step B41: merge M1,t、M2,t、M3,tTo obtain tensor
Figure BDA0002376341150000098
Mt=[M1,t,M2,t,M3,t]
Step B42: will MtInputting the data into a two-dimensional convolution neural network for convolution and pooling, and then inputting the data into a full-connection layer for dimensionality reduction to obtain a fusion utCharacterization vector of semantic information of a
Figure BDA0002376341150000099
Wherein d is3Dimension after dimension reduction of the full connection layer;
step B43: for each sentence in the dialog context U, a characterization vector of its semantic information with the answer a is calculated
Figure BDA00023763411500000910
Wherein L isuIs the number of sentences in the dialog context U.
Step B5: inputting the characterization vector sequence obtained in the step B4 into a bidirectional GRU network to obtain a characterization vector fusing context dependence of conversation and answer and semantic information
Figure BDA00023763411500000911
Wherein the sequence of vectors is to be characterized
Figure BDA00023763411500000912
Inputting the result into a bidirectional GRU network, modeling the relationship between the dialog context and the answer through the bidirectional GRU network, and taking the finally output hidden state vector as the context dependency relationship for fusing the dialog and the answer and the representation of semantic informationVector quantity
Figure BDA0002376341150000101
Wherein
Figure BDA0002376341150000102
Step B6: repeating the steps B2-B5, calculating the context dependency relationship of the fused dialog and answer of all the training samples in the dialog training set and the characterization vector of the semantic information
Figure BDA0002376341150000103
Step B7: feature vectors of all samples
Figure BDA0002376341150000104
Inputting the data into a full-connection layer of a deep learning network model, calculating the gradient of each parameter in the deep network by using a back propagation method according to a target loss function loss, and updating the parameter by using a random gradient descent method. The method specifically comprises the following steps:
step B71: the final characterization vector
Figure BDA0002376341150000105
Inputting the data into the full-link layer, and calculating the probability of answers belonging to each category by using softmax normalization, wherein the calculation formula is as follows:
Figure BDA0002376341150000106
gc(U,a)=softmax(y)
wherein, WsIs a full connection layer weight matrix, bsBias term for fully connected layer, gc(U, a) is the probability of answering the dialog context U belonging to the training sample (U, a) processed in step B1, 0 ≦ gc(U, a) is less than or equal to 1, c belongs to { correct, wrong };
step B72: calculating a loss value by using the cross entropy as a loss function, updating the learning rate by using a gradient optimization algorithm AdaGrad, and training a model by using a minimum loss function by updating model parameters through back propagation iteration;
the calculation formula of the Loss minimization function Loss is as follows:
Figure BDA0002376341150000107
wherein (U)i,ai) Representing the i-th training sample, y, in a conversational training set, TSiAs a class label, yi∈{0,1}。
Step B8: and terminating the training of the deep learning network model when the loss value generated by the deep learning network model is smaller than a set threshold value or reaches the maximum iteration number.
And C: and (5) carrying out dialogue with the user, inputting the user question into the trained deep learning network model, and outputting a matched answer.
The invention also provides a multi-turn dialog system adopting the method, as shown in fig. 2, comprising:
a training set building module for collecting the dialogue context and the answer data and building a dialogue training set TS;
the model training module is used for training a deep learning network model fusing the bidirectional GRU network by using a session Training Set (TS); and
and the multi-turn dialogue module is used for carrying out dialogue with the user, inputting the user questions into the trained deep learning network model and outputting the best matched answers.
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims (9)

1. A multi-turn dialogue method based on a bidirectional GRU network is characterized by comprising the following steps:
step A: collecting conversation context and answer data, and constructing a conversation Training Set (TS);
and B: training a deep learning network model fusing a bidirectional GRU network by using a session Training Set (TS);
and C: and (5) carrying out dialogue with the user, inputting the user question into the trained deep learning network model, and outputting a matched answer.
2. The method of claim 1, wherein step B specifically comprises the following steps:
step B1: traversing a dialogue Training Set (TS), and coding the dialogue context and answer of each training sample to obtain an initial characterization vector;
step B2: inputting the initial characterization vectors of the conversation context and the answer into a multi-head attention mechanism module to obtain semantic characterization vectors of the conversation and the answer, and calculating a word similarity matrix of the conversation and the answer;
step B3: inputting the dialogue context and the initial characterization vector of the answer obtained in the step B1 into a bidirectional GRU network, calculating bidirectional hidden states of the dialogue and the answer, and then calculating a forward semantic characterization matrix and a reverse semantic characterization matrix of the dialogue and the answer;
step B4: merging a word similarity matrix, a forward semantic representation matrix and a reverse semantic representation matrix of the conversation and the answer into a tensor, inputting the tensor into a two-dimensional convolution neural network, and then performing feature dimensionality reduction to obtain a representation vector sequence fusing semantic information of the conversation and the answer;
step B5: inputting the characterization vector sequence obtained in the step B4 into a bidirectional GRU network to obtain a characterization vector fusing context dependence of conversation and answer and semantic information
Figure FDA0002376341140000011
Step B6: repeating the steps B2-B5, calculating the context dependency relationship of the fused dialog and answer of all the training samples in the dialog training set and the characterization vector of the semantic information
Figure FDA0002376341140000012
Step (ii) ofB7: feature vectors of all samples
Figure FDA0002376341140000013
Inputting the data into a full-connection layer of a deep learning network model, calculating the gradient of each parameter in the deep network by using a back propagation method according to a target loss function loss, and updating the parameter by using a random gradient descent method;
step B8: and terminating the training of the deep learning network model when the loss value generated by the deep learning network model is smaller than a set threshold value or reaches the maximum iteration number.
3. The method of claim 2, wherein in step B1, the dialog training set is expressed as
Figure FDA0002376341140000014
Wherein N represents the number of training samples, and (U, a) represents a training sample consisting of a conversation context U and an answer a in a conversation training set TS, wherein the conversation context U consists of a plurality of sentences in a conversation process, and each sentence in the conversation context U and the answer a are coded respectively to obtain an initial characterization vector of the conversation context U; if utRepresents the t < th > sentence in the dialog context U, its initial token vector
Figure FDA0002376341140000021
Expressed as:
Figure FDA0002376341140000022
the initial characterization vector for answer a is represented as:
Figure FDA0002376341140000023
wherein,
Figure FDA0002376341140000024
Ltand LaRespectively represents utAnd a the number of remaining words after word segmentation and removal of stop words,
Figure FDA0002376341140000025
and
Figure FDA0002376341140000026
are respectively as
Figure FDA0002376341140000027
And
Figure FDA0002376341140000028
the word vector of the ith word is obtained by pre-training the word vector matrix
Figure FDA0002376341140000029
Is found in d1Represents the dimension of the word vector, | D | represents the number of words in the dictionary.
4. The method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 3, wherein said step B2 specifically comprises the steps of:
step B21: selecting the ability to divide d1S, for each sentence in the conversational context, its initial token vector
Figure FDA00023763411400000210
Initial characterization vector with answer
Figure FDA00023763411400000211
Dividing the last dimension into s sub-vectors to obtain sub-vector sequences
Figure FDA00023763411400000212
And
Figure FDA00023763411400000213
wherein
Figure FDA00023763411400000214
Is utThe h-th sub-vector of (2),
Figure FDA00023763411400000215
is that
Figure FDA00023763411400000216
The h-th sub-vector of (1);
step B22: will be provided with
Figure FDA00023763411400000217
Each subvector of (1) and
Figure FDA00023763411400000218
wherein the corresponding sub-vectors form a sub-vector pair, i.e.
Figure FDA00023763411400000219
Figure FDA00023763411400000220
Inputting the data into an attention mechanism module, and calculating to obtain
Figure FDA00023763411400000221
Semantic representation vector of
Figure FDA00023763411400000222
And
Figure FDA00023763411400000223
semantic representation vector of
Figure FDA00023763411400000224
Wherein
Figure FDA00023763411400000225
The calculation formula of (a) is as follows:
Figure FDA00023763411400000226
Figure FDA00023763411400000227
the calculation formula of (a) is as follows:
Figure FDA00023763411400000228
wherein T represents a matrix transpose operation;
computing
Figure FDA0002376341140000031
Weighted concatenation of to obtain utSemantic representation vector of
Figure FDA0002376341140000032
Is represented as follows:
Figure FDA0002376341140000033
computing
Figure FDA0002376341140000034
To obtain a semantic representation vector of a
Figure FDA0002376341140000035
Is represented as follows:
Figure FDA0002376341140000036
wherein W1,W2Training parameters for a multi-head attention system;
step B23: calculating the term facies of each sentence and answer in a conversational contextA similarity matrix; u. oftThe word similarity matrix representing the tth sentence in the dialog context with answer a
Figure FDA0002376341140000037
The calculation formula of (a) is as follows:
Figure FDA0002376341140000038
5. the method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 4, wherein said step B3 specifically comprises the steps of:
step B31: taking the initial characterization vectors of the answers as a sequence formed by word vectors, inputting the sequence into a bidirectional GRU network, and calculating forward and reverse hidden state vectors;
initial characterization vector of answer
Figure FDA0002376341140000039
Is regarded as
Figure FDA00023763411400000310
The formed sequence is sequentially input into forward GRU to obtain a forward hidden state vector of the answer
Figure FDA00023763411400000311
Will be provided with
Figure FDA00023763411400000312
Sequentially inputting reverse GRUs to obtain a reverse hidden state vector of the answer
Figure FDA00023763411400000313
Wherein
Figure FDA00023763411400000314
d2The unit number of GRU;
step B32: regarding the initial characterization vector of each sentence in the dialog context as a sequence formed by word vectors, inputting the sequence into a bidirectional GRU network, and calculating forward and reverse hidden state vectors;
utthe t-th sentence in the context of the presentation dialog will
Figure FDA00023763411400000315
Is regarded as
Figure FDA00023763411400000316
The formed sequence is input with forward GRU in turn to obtain the t-th sentence u in the dialog contexttForward hidden state vector of
Figure FDA00023763411400000317
Will be provided with
Figure FDA00023763411400000318
Inputting reverse GRU in turn to obtain the t-th sentence u in the dialog contexttReverse hidden state vector of
Figure FDA00023763411400000319
Wherein
Figure FDA00023763411400000320
Step B33: calculating a forward semantic representation matrix and a reverse semantic representation matrix of each sentence in the dialogue context; u. oftDenotes the t-th sentence in the dialog context, and its forward semantic representation matrix M with answer a2,tAnd a reverse semantic representation matrix M3,tThe calculation formula of (a) is as follows:
Figure FDA0002376341140000041
Figure FDA0002376341140000042
wherein,
Figure FDA0002376341140000043
6. the method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 5, wherein said step B4 specifically comprises the steps of:
step B41: merge M1,t、M2,t、M3,tTo obtain tensor
Figure FDA0002376341140000044
Mt=[M1,t,M2,t,M3,t]
Step B42: will MtInputting the data into a two-dimensional convolution neural network for convolution and pooling, and then inputting the data into a full-connection layer for dimensionality reduction to obtain a fusion utCharacterization vector of semantic information of a
Figure FDA0002376341140000045
Wherein d is3Dimension after dimension reduction of the full connection layer;
step B43: for each sentence in the dialog context U, a characterization vector of its semantic information with the answer a is calculated
Figure FDA0002376341140000046
Wherein L isuIs the number of sentences in the dialog context U.
7. The method of claim 6, wherein in step B5, the token vector sequence is represented by a two-way GRU network
Figure FDA0002376341140000047
Inputting the result into a bidirectional GRU network, modeling the relationship between the conversation context and the answer through the bidirectional GRU network, and outputting the finally output hidden state vectorCharacterization vectors as context dependencies and semantic information for fusing dialog and answers
Figure FDA0002376341140000048
Wherein
Figure FDA0002376341140000049
8. The method for multi-turn dialog based on a bidirectional GRU network as claimed in claim 7, wherein said step B7 specifically comprises the steps of:
step B71: the final characterization vector
Figure FDA00023763411400000410
Inputting the data into the full-link layer, and calculating the probability of answers belonging to each category by using softmax normalization, wherein the calculation formula is as follows:
Figure FDA00023763411400000411
gc(U,a)=softmax(y)
wherein, WsIs a full connection layer weight matrix, bsBias term for fully connected layer, gc(U, a) is the probability of answering the dialog context U belonging to the training sample (U, a) processed in step B1, 0 ≦ gc(U, a) is less than or equal to 1, c belongs to { correct, wrong };
step B72: calculating a loss value by using the cross entropy as a loss function, updating the learning rate by using a gradient optimization algorithm AdaGrad, and training a model by using a minimum loss function by updating model parameters through back propagation iteration;
the calculation formula of the Loss minimization function Loss is as follows:
Figure FDA0002376341140000051
wherein (U)i,ai) Presentation dialogue trainingSet i training sample in TS, yiAs a class label, yi∈{0,1}。
9. A multi-turn dialog system employing the method of any of claims 1-8 comprising:
a training set building module for collecting the dialogue context and the answer data and building a dialogue training set TS;
the model training module is used for training a deep learning network model fusing the bidirectional GRU network by using a session Training Set (TS); and
and the multi-turn dialogue module is used for carrying out dialogue with the user, inputting the user questions into the trained deep learning network model and outputting the best matched answers.
CN202010067240.9A 2020-01-20 2020-01-20 Multi-turn dialogue method and system based on bidirectional GRU network Active CN111274375B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010067240.9A CN111274375B (en) 2020-01-20 2020-01-20 Multi-turn dialogue method and system based on bidirectional GRU network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010067240.9A CN111274375B (en) 2020-01-20 2020-01-20 Multi-turn dialogue method and system based on bidirectional GRU network

Publications (2)

Publication Number Publication Date
CN111274375A true CN111274375A (en) 2020-06-12
CN111274375B CN111274375B (en) 2022-06-14

Family

ID=70996874

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010067240.9A Active CN111274375B (en) 2020-01-20 2020-01-20 Multi-turn dialogue method and system based on bidirectional GRU network

Country Status (1)

Country Link
CN (1) CN111274375B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434143A (en) * 2020-11-20 2021-03-02 西安交通大学 Dialog method, storage medium and system based on hidden state constraint of GRU (generalized regression Unit)
CN112632236A (en) * 2020-12-02 2021-04-09 中山大学 Improved sequence matching network-based multi-turn dialogue model
CN112818105A (en) * 2021-02-05 2021-05-18 江苏实达迪美数据处理有限公司 Multi-turn dialogue method and system fusing context information
CN113157855A (en) * 2021-02-22 2021-07-23 福州大学 Text summarization method and system fusing semantic and context information
WO2021147405A1 (en) * 2020-08-31 2021-07-29 平安科技(深圳)有限公司 Customer-service statement quality detection method and related device
CN114443827A (en) * 2022-01-28 2022-05-06 福州大学 Local information perception dialogue method and system based on pre-training language model
CN114490991A (en) * 2022-01-28 2022-05-13 福州大学 Dialog structure perception dialog method and system based on fine-grained local information enhancement
CN114564568A (en) * 2022-02-25 2022-05-31 福州大学 Knowledge enhancement and context awareness based dialog state tracking method and system
CN115276697A (en) * 2022-07-22 2022-11-01 交通运输部规划研究院 Coast radio station communication system integrated with intelligent voice
CN116128438A (en) * 2022-12-27 2023-05-16 江苏巨楷科技发展有限公司 Intelligent community management system based on big data record information

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108874972A (en) * 2018-06-08 2018-11-23 青岛里奥机器人技术有限公司 A kind of more wheel emotion dialogue methods based on deep learning
CN109460463A (en) * 2018-11-15 2019-03-12 平安科技(深圳)有限公司 Model training method, device, terminal and storage medium based on data processing
CN109933659A (en) * 2019-03-22 2019-06-25 重庆邮电大学 A kind of vehicle-mounted more wheel dialogue methods towards trip field
CN110020015A (en) * 2017-12-29 2019-07-16 中国科学院声学研究所 A kind of conversational system answers generation method and system
US20190385051A1 (en) * 2018-06-14 2019-12-19 Accenture Global Solutions Limited Virtual agent with a dialogue management system and method of training a dialogue management system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020015A (en) * 2017-12-29 2019-07-16 中国科学院声学研究所 A kind of conversational system answers generation method and system
CN108874972A (en) * 2018-06-08 2018-11-23 青岛里奥机器人技术有限公司 A kind of more wheel emotion dialogue methods based on deep learning
US20190385051A1 (en) * 2018-06-14 2019-12-19 Accenture Global Solutions Limited Virtual agent with a dialogue management system and method of training a dialogue management system
CN109460463A (en) * 2018-11-15 2019-03-12 平安科技(深圳)有限公司 Model training method, device, terminal and storage medium based on data processing
CN109933659A (en) * 2019-03-22 2019-06-25 重庆邮电大学 A kind of vehicle-mounted more wheel dialogue methods towards trip field

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
宋皓宇等: "基于DQN的开放域多轮对话策略学习", 《中文信息学报》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021147405A1 (en) * 2020-08-31 2021-07-29 平安科技(深圳)有限公司 Customer-service statement quality detection method and related device
CN112434143A (en) * 2020-11-20 2021-03-02 西安交通大学 Dialog method, storage medium and system based on hidden state constraint of GRU (generalized regression Unit)
CN112434143B (en) * 2020-11-20 2022-12-09 西安交通大学 Dialog method, storage medium and system based on hidden state constraint of GRU (generalized regression Unit)
CN112632236A (en) * 2020-12-02 2021-04-09 中山大学 Improved sequence matching network-based multi-turn dialogue model
CN112818105A (en) * 2021-02-05 2021-05-18 江苏实达迪美数据处理有限公司 Multi-turn dialogue method and system fusing context information
CN112818105B (en) * 2021-02-05 2021-12-07 江苏实达迪美数据处理有限公司 Multi-turn dialogue method and system fusing context information
CN113157855A (en) * 2021-02-22 2021-07-23 福州大学 Text summarization method and system fusing semantic and context information
CN114443827A (en) * 2022-01-28 2022-05-06 福州大学 Local information perception dialogue method and system based on pre-training language model
CN114490991A (en) * 2022-01-28 2022-05-13 福州大学 Dialog structure perception dialog method and system based on fine-grained local information enhancement
CN114564568A (en) * 2022-02-25 2022-05-31 福州大学 Knowledge enhancement and context awareness based dialog state tracking method and system
CN115276697A (en) * 2022-07-22 2022-11-01 交通运输部规划研究院 Coast radio station communication system integrated with intelligent voice
CN116128438A (en) * 2022-12-27 2023-05-16 江苏巨楷科技发展有限公司 Intelligent community management system based on big data record information

Also Published As

Publication number Publication date
CN111274375B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
CN111274375B (en) Multi-turn dialogue method and system based on bidirectional GRU network
CN108681610B (en) generating type multi-turn chatting dialogue method, system and computer readable storage medium
CN112667818B (en) GCN and multi-granularity attention fused user comment sentiment analysis method and system
CN109614471B (en) Open type problem automatic generation method based on generation type countermeasure network
WO2020140487A1 (en) Speech recognition method for human-machine interaction of smart apparatus, and system
CN108363695B (en) User comment attribute extraction method based on bidirectional dependency syntax tree representation
CN110489567B (en) Node information acquisition method and device based on cross-network feature mapping
CN110222163A (en) A kind of intelligent answer method and system merging CNN and two-way LSTM
CN113297364B (en) Natural language understanding method and device in dialogue-oriented system
CN111274398A (en) Method and system for analyzing comment emotion of aspect-level user product
CN112232087B (en) Specific aspect emotion analysis method of multi-granularity attention model based on Transformer
CN112527966B (en) Network text emotion analysis method based on Bi-GRU neural network and self-attention mechanism
CN114443827A (en) Local information perception dialogue method and system based on pre-training language model
CN111966800A (en) Emotional dialogue generation method and device and emotional dialogue model training method and device
CN115964467A (en) Visual situation fused rich semantic dialogue generation method
CN114818703B (en) Multi-intention recognition method and system based on BERT language model and TextCNN model
CN113239174A (en) Hierarchical multi-round conversation generation method and device based on double-layer decoding
CN113807079A (en) End-to-end entity and relation combined extraction method based on sequence-to-sequence
CN114328866A (en) Strong anthropomorphic intelligent dialogue robot with smooth and accurate response
CN113239678B (en) Multi-angle attention feature matching method and system for answer selection
CN112667788A (en) Novel BERTEXT-based multi-round dialogue natural language understanding model
CN116595985A (en) Method for assisting in enhancing emotion recognition in dialogue based on generated common sense
CN113705197B (en) Fine granularity emotion analysis method based on position enhancement
CN116150334A (en) Chinese co-emotion sentence training method and system based on UniLM model and Copy mechanism
CN115422945A (en) Rumor detection method and system integrating emotion mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant