Disclosure of Invention
In order to solve the technical problem, the application provides a training method for generating a model by using reply information and a method for generating reply information by using the trained model, so that the accuracy of the generated reply information is higher, and further, the natural language interaction habit of a user is better met.
In a first aspect, a method for training a reply information generation model is provided, including:
acquiring a first input sequence, wherein the first input sequence is obtained by converting input information of a current round of conversation in a training corpus;
coding the first input sequence by using a first coder to obtain a second coding hidden state vector by taking the first decoding hidden state vector as an initial value of a hidden layer of the first coder, wherein the first coder is a coder based on an RNN (radio network node) model, and the first decoding hidden state vector is a state value of the last step in the hidden layer of a decoder in the previous round of conversation;
decoding the second coding hidden state vector by adopting a decoder to obtain a first output sequence, wherein the decoder is based on an RNN model;
calculating the error between a standard output sequence and the first output sequence, wherein the standard output sequence is obtained by converting reply information of the current round of conversation in the training corpus;
updating parameters of the first encoder and the decoder according to the error if the error is above a preset end threshold.
With reference to the first aspect, in a first possible implementation manner of the first aspect, before the step of encoding the first input sequence by using the first encoder with the first decoding hidden state vector as an initial value of a hidden layer of the first encoder, the method further includes:
obtaining a semantic guide input sequence, wherein the semantic guide input sequence is obtained by converting semantic guide words, and the semantic guide words are words representing the semantics of the reply information of the current round in the training corpus;
encoding the semantic guidance input sequence by adopting a second encoder to obtain a semantic guidance hidden state vector, wherein the second encoder is an RNN (radio network node) model-based encoder, and the semantic guidance hidden state vector is a state value of the last step in a hidden layer of the second encoder;
horizontally connecting the semantic guidance hidden state vector with the first decoding hidden state vector to obtain a first controlled hidden state vector;
the method for coding the first input sequence by using the first encoder to obtain the second coding hidden state vector by using the first decoding hidden state vector as the initial value of the hidden layer of the first encoder specifically comprises the following steps:
and coding the first input sequence by adopting the first coder by taking the first controlled hidden state vector as an initial value of a hidden layer of the first coder to obtain a second coded hidden state vector.
With reference to the first implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the step of obtaining the semantic guidance input sequence includes:
acquiring reply information of the current round of conversation in the training corpus;
performing word segmentation on the reply information;
extracting semantic guide words from the word segmentation result;
and converting the semantic guide words into semantic guide input sequences.
With reference to the first or second implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the step of decoding the second encoded hidden state vector by using a decoder to obtain a first output sequence includes:
calculating decoding hidden state vector s of j time in decoder
jSeparately from the semantic guidance implicit state vector h in the second encoder
0All encoded hidden state vectors { h } in the first encoder
1,...,h
i,...,h
nAttention assignment weight of
Wherein i is 0, 1, 2, …, n; j is 1, 2, …, m; n is the encoded hidden state vector h in the first decoder
iM is the output value y in the output sequence of the decoder
jThe total number of (c);
calculation using softmax function
Obtaining a weighted average c
jWherein j is 0, 1, 2, …, m;
decoding the second encoded hidden state vector with a decoder to obtain a first output sequence { y }1’,...,yj’,...,ym’},yj’=g(yj-1,sj,cj),sj=f(yj-1,sj-1,cj) Wherein f is a nonlinear activation function, g is a softmax function, yj-1Is the input value, s, of the input layer of the decoder at the jth time instantjThe decoded hidden state vector of the hidden layer of the decoder at the jth moment.
With reference to the first aspect and the first to third implementation manners, in a fourth possible implementation manner of the first aspect, the training method further includes:
and if the error is lower than or equal to a preset end threshold value, determining current parameters of the first encoder and the decoder as parameters of a reply information generation model.
In a second aspect, a reply information generation method is provided, including:
inputting a second input sequence into a reply information generation model for encoding and decoding by taking a third decoding hidden state vector as an initial value of a hidden layer of a first encoder to obtain a second output sequence; the third decoding hidden state vector is a state value of the last step in a hidden layer of a decoder of a reply information generation model in the previous round of conversation, the second input sequence is obtained by converting second input information input by a user in the current round of conversation, and the reply information generation model is obtained by training by adopting the training method of the reply information generation model according to any one of claims 1 to 5;
and converting the second output sequence into a second reply message.
With reference to the second aspect, in a first possible implementation manner of the second aspect, before the step of inputting the second input sequence into the reply information generation model for encoding and decoding by using the third decoding hidden state vector as an initial value of the hidden layer of the first encoder to obtain the second output sequence, the method further includes:
acquiring a keyword input sequence, wherein the keyword input sequence is obtained by converting a preset second keyword;
coding the keyword input sequence by adopting a second coder to obtain a keyword hidden state vector, wherein the second coder is a coder based on an RNN (neural network) model, and the keyword hidden state vector is a state value of the last step in a hidden layer of the second coder;
horizontally connecting the keyword hidden state vector with the third decoding hidden state vector to obtain a second controlled hidden state vector;
and inputting a second input sequence into the reply information generation model for encoding and decoding by taking the third decoding hidden state vector as an initial value of a hidden layer of the first encoder to obtain a second output sequence, wherein the method specifically comprises the following steps of:
and inputting the second input sequence into the reply information generation model for encoding and decoding by taking the second controlled hidden state vector as an initial value of a hidden layer of the first encoder to obtain a second output sequence.
With reference to the first implementation manner of the second aspect, in a second possible implementation manner of the second aspect, the step of obtaining the keyword input sequence includes:
acquiring second input information input by a user in the current round of conversation;
extracting a first keyword from second input information, wherein the first keyword is a real word in the second input information;
acquiring a second keyword associated with the first keyword from a preset statistical library, wherein the statistical library is constructed on the basis of input information and reply information in the training corpus;
and converting the second keyword into a keyword input sequence.
In a third aspect, a training apparatus for generating a model from reply information is provided, including:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a first input sequence, and the first input sequence is obtained by converting input information of a current round of conversation in a training corpus;
the training unit is used for coding the first input sequence by adopting a first encoder by taking a first decoding hidden state vector as an initial value of a hidden layer of the first encoder to obtain a second coding hidden state vector; decoding the second coding hidden state vector by adopting a decoder to obtain a first output sequence; calculating an error of a standard output sequence from the first output sequence; and updating parameters of the first encoder and the decoder according to the error if the error is above a preset end threshold; the first encoder is an RNN model-based encoder, the first decoding hidden state vector is a state value of the last step in a hidden layer of a decoder in the previous round of conversation, the decoder is an RNN model-based decoder, and the standard output sequence is obtained by converting reply information of the current round of conversation in the corpus.
In a fourth aspect, there is provided a reply information generation apparatus including:
the generating unit is used for inputting the second input sequence into the reply information generating model for encoding and decoding by taking the third decoding hidden state vector as an initial value of the hidden layer of the first encoder to obtain a second output sequence; the third decoding hidden state vector is a state value of the last step in a hidden layer of a decoder of a reply information generation model in the previous round of conversation, the second input sequence is obtained by converting second input information input by a user in the current round of conversation, and the reply information generation model is obtained by training by adopting any one of the training methods of the reply information generation model in the first aspect;
and the conversion unit is used for converting the second output sequence into second reply information.
According to the method for training the reply information generation model, the reply information generation model is trained by adopting the training corpora of multiple rounds of conversations, the reply information of the previous round of conversations in the training corpora is introduced into the reply information generation process of the next round of conversations, and the reply information generation model is obtained through training. And then generating reply information by using the trained reply information generation model. Similarly to the training process, when generating the reply information, the state value of the last step in the hidden layer of the decoder in the previous round of conversation is used as the initial value of the hidden layer of the first encoder in the next round of conversation, the second input sequence of the next round is encoded, and then the decoder is used for decoding. Therefore, the conversation information of the previous round or the previous rounds is introduced into the generation process of the reply information of the next round of conversation, so that more accurate reply information is generated, and the natural language interaction habit of the user is better met.
Detailed Description
The following provides a detailed description of the embodiments of the present application.
In order to solve the problem of low accuracy of reply information generated only by input information of the current round, a reply information generation method based on a time sliding window has been developed, that is, a time sliding window is set to extract a session record within a certain time range before the current session, characteristics are extracted from the session record and supplemented to the session information of the current round, and then reply information of the current round is generated according to the session information after the current round is supplemented. However, such methods have several problems: first, the length of the time sliding window is difficult to determine; secondly, there is a great difficulty in extracting features from the session records and complementing the features into the session information of the current round. This is because spoken language phenomena are common in natural language expressions, for example, sentence components are omitted, and therefore, there is no satisfactory solution to how to extract effective features from the conversation records and how to accurately complement the features to appropriate positions in the current round of conversation information. Therefore, the reply information generation method based on the time sliding window has the problem of low accuracy of reply information.
Therefore, the application provides a new reply information generation method, which mainly utilizes a reply information generation model comprising a first encoder and a decoder, takes the state value of the last step in the hidden layer of the decoder in the previous round of conversation as the initial value of the hidden layer of the first encoder in the next round of conversation, encodes the input sequence of the next round, and then decodes by the decoder, thereby introducing the conversation information of the previous round or the previous rounds into the generation process of the reply information of the next round of conversation, and further generating more accurate reply information.
In this application, the reply information generation model includes a first encoder and a decoder, and the first encoder is an RNN model-based encoder and the decoder is an RNN model-based decoder, that is, the first encoder and the decoder are both implemented by an RNN model.
The RNN model is classified according to cells of the RNN model, and comprises a simple RNN model, an L STM (Short-time Memory L ong Short-Term Memory) model, a GRU (threshold cycle unit) model, a Bidirectional RNN (Bidirectional recurrent neural Network) model and the like, wherein the Bidirectional RNN model comprises a Bidirectional simple RNN model, a Bidirectional L STM model and a Bidirectional GRU model.
The first encoder and decoder in the reply information generation model in the present application may be based on any of the RNN models described above. Wherein the first encoder comprises an input layer and a hidden layer, and the decoder comprises an input layer, a hidden layer and an output layer. It should be noted here that the first encoder may not include the output layer, and may also include the output layer, which is not limited in this application. Even if the first encoder comprises an output layer, the output values of its output layer are not used in the scheme of the present application, so in the most basic case, the first encoder in the present application may comprise only an input layer and a hidden layer.
Through experiments, compared with a one-way RNN model, the two-way RNN model is adopted by the first encoder and the decoder, and the accuracy of the reply information generated by the obtained reply information generation model is higher.
For convenience of understanding, the present embodiment first describes a training process of the reply information generation model, and then describes a process of generating reply information using the trained reply information generation model.
The training phase is mainly used for determining parameters in the reply information generation model. Taking a simple RNN model as an example, the main parameters determined in the training phase include: the weight U from the input layer to the hidden layer, the weight W of the hidden state vector transfer between the hidden layers, and the weight V from the hidden layer to the output layer.
Referring to fig. 2 and 3, in a first embodiment of the present application, a method for training a reply information generation model is provided, which includes steps S100-S500.
S100: and acquiring a first input sequence, wherein the first input sequence is obtained by converting input information of the current round of conversation in the training corpus.
In the present application, the corpus includes at least one set of multi-turn corpus, the multi-turn corpus includes at least two turns of session information, and the session information of each turn is a data pair of "input information-reply information". For example, a set of multi-turn corpora may be:
the first round of input information: i want to eat the mango in Yunnan.
First round reply information: the mango fruit of Yunnan is not wrong.
And inputting information in the second round: can you buy it on the network?
Second round reply information: help you find an internet shop bar selling yunnan mango?
The third round of input information: preferably o.
The third round replies with the message: the sales of XX [ shop name ] are high.
In the actual training process, the training corpus usually contains many groups of multiple rounds of corpuses, for example, the training corpus may include one thousand groups, ten thousand groups, or even more.
In step S100, the first input sequence is obtained by converting input information of a current turn of conversation in the corpus. Specifically, each word of the input information may be converted into a word vector by using a preset dictionary, where the preset dictionary includes a corresponding relationship between each word and the word vector. The word vector for each word can be trained in advance using existing methods, for example, word2vec, which is a tool for word vector calculation.
For example, assume that the current round is the second round, and the input information of the second round is: can you buy it on the network?
Converting the input information into a first input sequence:
web → x 1; up → x 2; energy → x 3; buy → x 4; to → x 5; do → x 6; is there a → x 7. A first input sequence x1, x2, x3, x4, x5, x6, x7 is obtained.
S200: and coding the first input sequence by adopting a first coder to obtain a second coding hidden state vector by taking the first decoding hidden state vector as an initial value of a hidden layer of the first coder, wherein the first coder is a coder based on an RNN (radio network node) model, and the first decoding hidden state vector is a state value of the last step in the hidden layer of a decoder in the previous round of conversation.
The hidden state is a state of a hidden layer of an encoder or decoder based on the RNN model (which may be understood as a certain time), and a value of the state, that is, a hidden state vector may be represented by a vector.
Here, a process of training the reply information generation model using the input information and the reply information of the previous session in the corpus is described. For example, the previous corpus example is assumed to be the first round of conversation.
And calculating to obtain a coding hidden state vector at a first moment by taking a randomly set vector as an initial value of a hidden layer of a first coder and taking a word vector of a first word I of the first round of input information as an input value of an input layer of the first coder. In this step, the initial value of the first encoder hidden layer may be randomly set or may be preset by a user, for example, may be normally preset to [0, 0, 0, 0, … …, 0 ]. When the initial value of the hidden layer of the first encoder is randomly set, random value taking can be performed according to the rules of uniform distribution or truncated positive distribution and the like. And then, the coding hidden state vector at the first moment is taken as the input value of the hidden layer of the first coder, the word vector of the second word 'want' of the first round of input information is taken as the input value of the input layer of the first coder, and the coding hidden state vector at the second moment is obtained through calculation. According to the calculation method, until the word vector of the eighth word effect of the first round of input information is used as the input value of the input layer of the first encoder, the encoding hidden state vector at the eighth moment, namely the first encoding hidden state vector, is calculated. To this end, the first encoder completes the encoding process for the input sequence of the first round of the session.
The first encoded hidden state vector is used as the initial value of the decoder hidden layer, and the terminator is used ". "the word vector is the input value of the decoder input layer, and the decoding hidden state vector at the first moment is obtained by calculation. And then, calculating to obtain the decoding hidden state vector at the second moment by taking the decoding hidden state vector at the first moment as the input value of the hidden layer of the decoder and taking the word vector of the first word cloud of the first round of recovery information as the input value of the input layer of the decoder. According to the calculation method, until the word vector of the eighth word "of the first round of reply information is used as the input value of the decoder input layer, a decoding hidden state vector at the ninth moment, namely the first decoding hidden state vector, is calculated, namely the state value of the last step in the hidden layer of the decoder in the first round of conversation. To this end, the decoder completes the decoding process for the first round of the session.
In step S200, the first encoder encodes the first input sequence similarly to the process of encoding the input information of the first session in the corpus. The difference is that the random vector is not used as the initial value of the first encoder hidden layer, but the first decoding hidden state vector is used as the initial value of the first encoder hidden layer. And coding the first input sequence by adopting a first coder to obtain a second coding hidden state vector.
S300: and decoding the second coding hidden state vector by adopting a decoder to obtain a first output sequence, wherein the decoder is based on an RNN model.
In step S300, the process of decoding the second encoded hidden state vector by the decoder is similar to the process of decoding the reply message of the first round session in the corpus. Firstly, taking a second coding hidden state vector as an initial value of a hidden layer of a decoder, taking a word vector of an end character of current round input information as an input value of an input layer of the decoder, and calculating to obtain a decoding hidden state vector at a first moment; and calculating to obtain an output vector of the first moment by using the decoding hidden state vector of the first moment. Then, the decoding hidden state vector at the first moment is taken as the input value of the hidden layer of the decoder, the word vector of the first word of the current turn reply information is taken as the input value of the input layer of the decoder, and the decoding hidden state vector at the second moment is obtained through calculation; and calculating the output vector of the second moment by using the decoding hidden state vector of the second moment. According to the calculation method, until the word vector of the last word of the current round reply information is used as the input value of the decoder input layer, the decoding hidden state vector at the last moment is obtained through calculation, namely the second decoding hidden state vector; and calculating by using the second decoding hidden state vector to obtain an output vector of the last moment. At this point, the decoder completes the decoding process for the second encoded hidden state vector of the current round. The set of output vectors at all time instants is the first output sequence.
Referring to fig. 3, fig. 3 is a schematic diagram illustrating steps of encoding and decoding of the reply information generation model in a training process according to a time sequence, wherein a first encoder in the reply information generation model is a simple RNN model, and a decoder is the simple RNN model.
The first encoder comprises an input layer and a hidden layer, the set of input values in the input layer being denoted x1,...,xi,xi+1,.. }; the set of hidden state vectors in the hidden layer is labeled h1,...,hi,hi+1,...}。
The decoder comprises an input layer, a hidden layer and an output layer, the set of input values in the input layer being denoted y1,...,yj,yj+1,.. }; the set of hidden state vectors in the hidden layer is labeled as sj,...,sj,sj+1,.. }; the set of output values in the output layer is labeled y1’,...,yj’,yj+1’,...}。
xiFor the input of the input layer of the first encoder at the ith time instantThe value is obtained.
hiThe encoded hidden state vector of the hidden layer of the first encoder at the ith time instant. Wherein h isi=f(xi,hi-1) F is generally a non-linear activation function, such as tanh or Re L UiOne common calculation method of (1) is: h isi=f(Uxi+Whi-1)。
yj-1Is the input value of the input layer of the decoder at the jth time instant.
sjDecoding hidden state vector, s, for the hidden layer of the decoder at time jj=f(yj-1,sj-1) Where f is generally a non-linear activation function, such as tanh or Re L U, etcjOne common calculation method of (1) is: sj=f(Uyj-1+Wsj-1)。
yj' is the output value of the output layer of the decoder at the jth time instant, yj’=g(sj). Wherein g is a softmax function. More specifically, yjOne common calculation method of' is: y isj’=softmax(Vsj)。
hzIs the first encoded hidden state vector.
szThe hidden state vector is decoded for the first time.
U is the weight from the input layer to the hidden layer in the first encoder/decoder and can be represented by a matrix.
W is the weight of the last concealment layer to the next concealment layer in the first encoder/decoder and can be represented by a matrix.
V is the weight from the hidden layer to the output layer in the decoder, and can be represented by a matrix.
U, W and V are parameters of the reply information generation model, and after the reply information generation model is expanded in the time dimension, please refer to FIG. 3, U, W and V, the values of which at each moment are kept unchanged. That is, in the first encoder and decoder, the parameters U, W and V are shared.
At the beginning of the model for generating the training reply information, the initial values of the parameters U, W and V can be randomly selected or manually preset. During the training, the values of U, W and V are continuously updated. When the training of the reply information generation model is completed, U, W and V are determined, and the error between the standard output sequence and the first output sequence is minimized.
S400: and calculating the error between a standard output sequence and the first output sequence, wherein the standard output sequence is obtained by converting reply information of the current round of conversation in the training corpus.
The standard output sequence is obtained by converting reply information of the current round of conversation in the training corpus, and the first output sequence is obtained by converting the reply information predicted by the reply information generation model under the current parameters. That is, the standard output sequence may be understood as the correct reply information, the first output sequence may be understood as the predicted reply information, and by calculating the error between the standard output sequence and the predicted reply information, the difference between the predicted reply information and the correct reply information generated by the reply information generation model under the current parameter may be observed, so as to adjust the parameter of the reply information generation model according to the error.
Specifically, each word of the reply information of the current round of session in the corpus may be converted into a word vector by using a preset dictionary, where the preset dictionary may be the preset dictionary in the step S100.
The error of the standard output sequence and the first output sequence can be calculated by using known error calculation methods commonly used in the process of training the RNN model. For example, cross-entropy (cross-entropy) loss functions may be used for the calculation.
Specifically, taking the cross entropy loss function as an example to calculate the error, referring to fig. 3, it can be seen from fig. 3 that the standard output sequences are { y1, y2, y3, y4}, and the first output sequences are { y1 ', y 2', y3 ', y 4' }. Assuming that the first input sequence has a total of n word vectors, the total error between the standard output sequence and the first output sequence can be expressed as:
L(y,y’)=-1/nΣj∈N(yjlogyj’);
wherein, yjRepresenting the j-th value, y, in the standard output sequencej' denotes the jth output value in the first output sequence.
S500: updating parameters of the first encoder and the decoder according to the error if the error is above a preset end threshold.
Here, according to the step of error updating the parameters of the first encoder and the decoder, an existing deep learning parameter updating method may be employed, for example, a Back-propagation through time (BPTT) algorithm may be employed to update the parameters of the first encoder and the decoder.
The parameters of the specific update differ according to the RNN model used by the first encoder and the decoder, respectively. For example, if the first encoder and decoder are implemented by a simple RNN model, the parameters updated here may include U, W and V.
For another example, if the first encoder and decoder are implemented by a bi-directional RNN model, please refer to fig. 4, the parameters updated at this time may include U, W, V, U ', W ', and V '. Wherein, U is the weight from the input layer to the hidden layer during forward encoding or forward decoding; w is the weight from the previous hidden layer to the next hidden layer in the forward coding or the forward decoding; v is the weight from the hidden layer to the output layer in forward coding or decoding. U' is the weight from the input layer to the hidden layer during negative coding or negative decoding; w' is the weight from the previous hidden layer to the next hidden layer during negative coding or negative decoding; v' is the weight from the hidden layer to the output layer in the negative encoding or negative decoding. U, W, V, U ', W ', and V ' may each be represented by a matrix.
Optionally, the training method for the reply information generation model of this embodiment further includes:
s600: and if the error is lower than or equal to a preset end threshold value, determining current parameters of the first encoder and the decoder as parameters of a reply information generation model.
Here, if the error is less than or equal to a preset ending threshold, that is, the reply information generation model is trained, the current parameters of the first encoder and the decoder are determined as the parameters of the reply information generation model, so that the reply information generation model with determined parameter values, that is, the trained reply information generation model, is obtained. The model can then be used to make predictions and generate reply messages.
Referring to fig. 5 and 6, in a second embodiment of the present application, a training method of a reply information generation model is provided, including steps S100, S710, S720, S730, S201, S300, S400, and S500.
S100: and acquiring a first input sequence, wherein the first input sequence is obtained by converting input information of the current round of conversation in the training corpus.
In this embodiment, the step S100 in the first embodiment may refer to the description of the step S100 in the first embodiment, and is not described herein again.
S710: and acquiring a semantic guide input sequence, wherein the semantic guide input sequence is obtained by converting semantic guide words, and the semantic guide words are words representing the semantics of the reply information of the current round in the training corpus. The semantic guide input sequence may be represented as z1,…zk,zk+1,…}。
Specifically, referring to fig. 7, the step of obtaining the semantic guidance input sequence may include:
s711: acquiring reply information of the current round of conversation in the training corpus;
s712: performing word segmentation on the reply information;
s713: extracting semantic guide words from the word segmentation result;
s714: and converting the semantic guide words into semantic guide input sequences.
In the step S712, the word segmentation may be implemented by using an existing word segmentation method or a word segmentation tool, for example, a word segmentation method based on string matching, a word segmentation method based on understanding, or a word segmentation method based on statistics, or a word segmentation tool such as jieba, THU L AC, which is not limited in this application.
For example, the reply message of the current round session of the obtained corpus is "help you find an internet shop bar selling yunnan mango? ". The reply information is participated to obtain a participated result: help/you/find/sell/yunnan/mango/internet shop/bar. Wherein, the parts of speech of 'help', 'finding' and 'selling' are verbs, the part of speech of 'you' is pronoun, the parts of speech of 'Yunnan', 'mango' and 'online shop' are nouns, and 'bar' is a pronoun.
In step S713, as described above, the semantic guide word is a word that represents the semantics of the reply information of the current round in the corpus. In general, a real word (e.g., verb, noun, adjective, etc.) in a sentence can characterize the semantics of the sentence to some extent. Thus, in one implementation, the semantic guide words may be extracted from the part-of-speech of the participles in the participle result.
In the step of S714, the semantic guide word is converted into a semantic guide input sequence, and each word of the semantic guide word may be converted into a word vector by using a preset dictionary, thereby obtaining the semantic guide input sequence. The preset dictionary may be the preset dictionary in the step of S100 in the first embodiment.
S720: and coding the semantic guidance input sequence by adopting a second coder to obtain a semantic guidance hidden state vector, wherein the second coder is a coder based on an RNN (radio network node) model, and the semantic guidance hidden state vector is a state value of the last step in a hidden layer of the second coder.
In step S720, the second encoder is an RNN model-based encoder, and may be based on any one of a simple RNN model, L STM model, GRU model, and bi-directional RNN model, which is not limited in this applicationk. Semantic guidanceR when the hidden state vector is k with maximum valuekIt is also denoted as h0。
Referring to fig. 6, the second encoder in fig. 6 adopts a bi-directional RNN model, which includes forward encoding and backward encoding processes, so that the encoded hidden state vector in the hidden layer at the k-th time in the second encoder is represented as
When the hidden state vector is maximum in k value under semantic guidance
It is also denoted as
Assume that the semantic directive word is "online store eos", where eos is an end character. The process of the second encoder encoding the semantic guidance input sequence is exemplified below with a second encoder based on a simple RNN model.
The randomly set vector is used as an initial value of a hidden layer of a second encoder, a word vector of a first word 'net' of a semantic guidance input sequence is used as an input value of an input layer of the second encoder, and a coding hidden state vector r at a first moment is obtained through calculation1. In this step, the hidden state vector r is then encoded at the first time instant1Calculating to obtain a coding hidden state vector r at a second moment by taking a word vector of a second word store of a semantic guidance input sequence as an input value of a hidden layer of a second coder2. Then, the hidden state vector r is encoded2Calculating to obtain a coding hidden state vector r at a third moment by taking a word vector of an end character of a semantic guidance input sequence as an input value of a hidden layer of a second coder and taking a word vector of an end character of the semantic guidance input sequence as an input value of the input layer of the second coder3. Here, r3I.e. the semantic-directed hidden state vector, which is also denoted as h0. To this end, the second encoder completes the encoding process for the semantic guidance input sequence.
The process of encoding the semantic guidance input sequence by using the second encoder based on the bi-directional RNN model is similar to this, except that the encoding process includes a forward encoding process and a reverse encoding process, and reference may be made to the schematic diagram of the second encoder in fig. 6, which is not described herein again.
S730: and horizontally connecting the semantic guidance hidden state vector with the first decoding hidden state vector to obtain a first controlled hidden state vector.
Following the example in step S720, the semantic-directed hidden-state vector is h0The first decoding hidden state vector is szConnecting the two horizontally to obtain a first controlled hidden state vector [ h ]0,sz]. Here, h0And szAre all expressed in a matrix, assuming h0Represented as a matrix, s, of 3 × 3zRepresented as a matrix of 3 × 4, the first controlled hidden state vector h0,sz]Shown as a matrix of 3 × 7.
If the second encoder and the first encoder are both realized by a bidirectional RNN model, the semantic guidance hidden state vector is
The first decoding hidden state vector is
Connecting the two horizontally to obtain a first controlled hidden state vector
S201: and coding the first input sequence by adopting the first coder by taking the first controlled hidden state vector as an initial value of a hidden layer of the first coder to obtain a second coded hidden state vector.
In the step of S201, the process of the first encoder encoding the first input sequence is similar to the process in the step of S200. The difference is that the step S200 uses the first decoding hidden state vector as the initial value of the hidden layer of the first encoder, and the step S201 uses the first controlled hidden state vector as the initial value of the hidden layer of the first encoder.
In this embodiment, through the steps of S710, S720, S730, and S201, the semantic guidance word extracted from the answer to the corpus is introduced into the training process of the reply information generation model. Therefore, when the reply information is generated by using the trained reply information generation model, the second keyword preset by the user or the second keyword obtained according to a certain rule can be introduced to guide the generation of the reply information, so that better and more accurate reply information can be obtained.
For example, if no semantic guide word is introduced for training, the training results in the reply information generation model M1.
The input information input by the user is as follows: the people on the photo are my girlfriend.
Then, through the prediction of the model M1 and through the transformation, the obtained reply message is: is.
If a semantic guide word is introduced for training, the reply information generation model M2 is obtained.
The input information input by the user is as follows: the people on the photo are my girlfriend.
The second keywords acquired for semantic guidance are: and (4) the appearance is beautiful.
Then, through the prediction of the model M2 and through the transformation, the obtained reply message is: am beautiful!
As can also be seen from this example, when a semantic guide word is introduced to train the reply information generation model and reply information is generated using the trained model, more accurate reply information can be obtained.
S300: and decoding the second coding hidden state vector by adopting a decoder to obtain a first output sequence, wherein the decoder is based on an RNN model.
S400: and calculating the error between a standard output sequence and the first output sequence, wherein the standard output sequence is obtained by converting reply information of the current round of conversation in the training corpus.
S500: updating parameters of the first encoder and the decoder according to the error if the error is above a preset end threshold.
In this embodiment, the steps S300 to S500 may refer to the descriptions of the steps S300 to S500 in the first embodiment, and are not described herein again.
Since the reply information generation model is always linked with the second encoding hidden state vector in the encoding and decoding processes, each word vector x in the first input sequenceiFor each decoded output value y in the decoderjThe contribution of (c) is the same. In such an approach, the first encoder compresses the information of the entire first input sequence into a fixed-length vector, which easily causes two problems. First, the second encoded hidden state vector does not fully convey the information of the entire first input sequence. Second, information input to the encoder input layer at the beginning of the first input sequence is easily overwritten by information input to the encoder input layer later, so that much detail information is lost, which is especially apparent when the length of the first input sequence is long.
Therefore, an Attention Mechanism (Attention Mechanism) is introduced in the decoding process of the training method of the reply information generation model, so that each word vector x in the first input sequence can be embodiediFor the decoded output value yjAnd the decoded output value yjCan focus more on the significance and the output value y in the first input sequencejAnd related word vectors to improve the accuracy of the obtained reply message.
Specifically, in one implementation, the decoding, by the decoder, the second encoded hidden state vector in the step of S300 to obtain the first output sequence may include the steps of S301 to S303.
S301: calculating decoding hidden state vector s of j time in decoder
jSeparately from the semantic guidance implicit state vector h in the second encoder
0All encoded hidden state vectors { h } in the first encoder
1,...,h
i,...,h
nAttention assignment weight of
Wherein i is 0, 1, 2, …, n; j is 1, 2, …, m; n is the encoded hidden state vector h in the first decoder
iM is the output value y in the output sequence of the decoder
jThe total number of the cells.
When i is equal to 0, the data is transmitted,
representing the output value y at the jth instant in the decoder
jImplicit state vector h for semantic guidance in the second encoder
0The attention of (1) is assigned a weight.
The higher the value of (d), the higher the output value y at the jth time in the decoder
jThe more attention that is allocated on the semantic-guiding input sequence in the second encoder, y is generated in the decoding process
jTime-of-flight semantically guided hidden state vector h
0The greater the effect of (c).
When i is 1, 2, … n,
representing the output value y at the jth instant in the decoder
jRespectively encoding all the hidden state vectors { h } in the first encoder
1,...,h
i,h
i+1,.. } assigning weights to attention.
The higher the value of (d), the higher the output value y at the jth time in the decoder
jEncoding an i-th latent state vector h in a first encoder
iThe more attention is allocated. It can also be understood that the output value y at the jth time in the decoder
jIn the first encoder the ith input value x
iThe more attention is allocated, so y is generated in the decoding process
jTime receiving x
iThe greater the effect of (c).
In an implementation methodIn the formula (I), the compound is shown in the specification,
can be calculated by the following formula:
eji=vatanh(Wasj-1+Uahi) Where Va, Wa and Ua are parameter values, they may be represented by a matrix.
Here, ejiIn effect, an alignment model, which is a feedforward neural network nested in the RNN model, is trained together when training the reply information generation model.
S302: calculation using softmax function
Obtaining a weighted average c
j(ii) a Wherein j is 0, 1, 2, …, m.
Here, cjIs a semantically guided latent state vector h from the second encoder0And a set of hidden state vectors h of the first encoder at encoding1,...,hi,hi+1,., by adding weights.
In particular, the amount of the solvent to be used,
s303: decoding the second encoded hidden state vector with a decoder to obtain a first output sequence { y }1’,...,yj’,...,ym’},yj’=g(yj-1,sj,cj),sj=f(yj-1,sj-1,cj) Where f is generally a nonlinear activation function, such as tanh or Re L U, and g is a softmax function.
In one implementation, sj=f(Uyj-1+Wsj-1+cj)。
The use phase of the reply information generation model, that is, the process of generating the reply information using the reply information generation model, will be described below by way of the third embodiment and the fourth embodiment.
Referring to fig. 8 and 9, in a third embodiment of the present application, a reply information generation method is provided, which includes steps S800 and S900.
S800: and inputting the second input sequence into the reply information generation model for encoding and decoding by taking the third decoding hidden state vector as an initial value of the hidden layer of the first encoder to obtain a second output sequence.
Here, the third decoding hidden state vector is obtained by converting a state value of a last step in a hidden layer of a decoder of a reply information generation model in a previous session, the second input sequence is obtained by converting second input information input by a user in a current session, and the reply information generation model is obtained by training with the training method of any one of the foregoing first embodiment and the second embodiment.
In step S800, the second input sequence is converted from the second input information input by the user in the current round of session. Specifically, each word of the input information may be converted into a word vector by using a preset dictionary, where the preset dictionary includes a corresponding relationship between each word and the word vector. The word vector for each word can be trained in advance using existing methods, for example, word2vec, which is a tool for word vector calculation. The preset dictionary may be the same as the preset dictionary used for training the reply information generation model.
The process of encoding and decoding the second input sequence input to the reply information generation model is similar to the process of encoding and decoding when the reply information generation model is trained. The difference is that in the training process, the word vector of each word of the current round reply information in the training corpus is used as the input value of the input layer of the decoder during decoding; in the use process, the output value decoded at the previous moment in the decoder is used as the input value of the input layer at the next moment in the decoding process.
Referring to fig. 9, fig. 9 is a schematic diagram illustrating steps of encoding and decoding of the reply information generation model in a time sequence during use, wherein a first encoder in the reply information generation model is a simple RNN model, and a decoder in the reply information generation model is a simple RNN model.
The first encoder comprises an input layer and a hidden layer, the set of input values in the input layer being denoted x1,...,xi,xi+1,.. }; the set of hidden state vectors in the hidden layer is labeled h1,...,hi,hi+1,...}。
The decoder comprises an input layer, a hidden layer and an output layer, the set of hidden state vectors in the hidden layer being denoted as sj,...,sj,sj+1,.. }; the set of output values in the output layer is labeled y1’,...,yj’,yj+1’,...}。
xiIs the input value of the input layer of the first encoder at the ith time instant.
hiThe encoded hidden state vector of the hidden layer of the first encoder at the ith time instant. Wherein h isi=f(xi,hi-1) F is generally a non-linear activation function, such as tanh or Re L UiOne common calculation method of (1) is: h isi=f(Uxi+Whi-1)。
yj-1' is the input value of the input layer of the decoder at the jth time instant.
sjDecoding hidden state vector, s, for the hidden layer of the decoder at time jj=f(yj-1,sj-1) Where f is generally a non-linear activation function, such as tanh or Re L U, etcjOne common calculation method of (1) is: sj=f(Uyj-1+Wsj-1)。
yj' is the output value of the output layer of the decoder at the jth time instant, yj’=g(sj). Wherein g is a softmax function. More specifically, yjOne common calculation method of' is: y isj’=softmax(Vsj)。
hzA third encoded hidden state vector.
szThe hidden state vector is decoded for a third time.
U is the weight from the input layer to the hidden layer in the first encoder/decoder and can be represented by a matrix.
W is the weight of the last concealment layer to the next concealment layer in the first encoder/decoder and can be represented by a matrix.
V is the weight from the hidden layer to the output layer in the decoder, and can be represented by a matrix.
S900: and converting the second output sequence into a second reply message.
The conversion of the second output sequence into the second reply message may also be performed by using a predetermined dictionary, as described above, the predetermined dictionary includes the corresponding relationship between each word and the word vector, and each word vector in the second output sequence is sequentially converted into the corresponding word, that is, the second reply message. The preset dictionary here may be the same as the preset dictionary employed in the step of S800.
In the reply information generation method in this embodiment, a trained reply information generation model is used, a state value of a last step in an implicit layer of a decoder in a previous session is used as an initial value of an implicit layer of a first encoder in a next session, a second input sequence of the next session is encoded, and the encoded second input sequence is decoded by the decoder, so that session information of the previous session or previous sessions is introduced into a generation process of reply information of the next session, and more accurate reply information is generated.
For example,
the user input information in the first session is as follows: i want to eat litchi.
Reply information generated in the first round of conversation: the litchi of Guangdong is not wrong.
The user input information in the second session is: can you buy it on the network?
If a conventional intelligent question-answering system is adopted, reply information can be generated only according to information input by a user in the second round of conversation, and because the information which can be acquired in the second round of conversation is limited, the system can only generate preset safe reply information as the reply information of the second round of conversation, such as: interesting problem!
By adopting the method, the state value of the last step in the hidden layer of the decoder in the previous round of conversation contains all or most of the semantics of the reply information of the previous round of conversation, and the state value is used as the initial value of the hidden layer of the first encoder in the next round of conversation, so that the conversation information of the previous round or the previous rounds is introduced into the generation process of the reply information of the next round of conversation, a supplementary effect is achieved, and the reply information generation model can generate more accurate reply information in the second round of conversation.
For example, by using the method of the present application, the reply information generated in the second round of session is: help you find an internet shop bar selling guangzhou litchis?
Referring to fig. 10, in a fourth embodiment of the present application, a reply information generation method is provided, including the steps of S1010, S1020, S1030, S801, and S900.
S1010: and acquiring a keyword input sequence, wherein the keyword input sequence is obtained by converting a preset second keyword.
Here, the second keyword may be preset directly by the user, or may be preset in another manner. The second keyword is used for performing semantic guidance on the reply information generation process, so that the generated reply information can be more accurate and better conforms to the natural language communication habit of the user.
Referring to fig. 11, in one implementation manner, the obtaining the keyword input sequence may specifically include:
s1011: acquiring second input information input by a user in the current round of conversation;
s1012: extracting a first keyword from second input information, wherein the first keyword is a real word in the second input information;
s1013: acquiring a second keyword associated with the first keyword from a preset statistical library, wherein the statistical library is constructed on the basis of input information and reply information in the training corpus;
s1014: and converting the second keyword into a keyword input sequence.
In the step S1012, specifically, the second input information may be segmented first, and during the segmentation process, the part of speech of the segmentation result may be labeled at the same time. And then extracting at least one real word from the word segmentation result as a first keyword, wherein the real word can represent the semantics of second input information input by the user in the current round of conversation to a certain extent.
In the step of S1013, a second keyword corresponding to the first keyword is acquired from a preset statistical library;
the preset statistical library is constructed based on the input information and the reply information in the corpus. Specifically, the method comprises the steps of segmenting words of an input information-reply information pair in a training corpus, extracting at least one first real word from input information, extracting at least one second real word from reply information, and establishing association between each first real word and each second real word. And after all the training corpora are extracted and the association is established, counting the second real words associated with each first real word to obtain statistical data. The probability that when the first real word appears in the input information, the second real word associated with the first real word appears in the reply information can be obtained from the statistical data.
Therefore, the statistical library includes at least one first real word, at least one second real word, and the probability of the second real word associated with each first real word appearing in the reply message.
For example, the first real word is "eat", and the second real words associated therewith are "good eat", "rice", and "noodles". When the input information has "eat", the probability of the second real word "good eating" appearing in the reply information is 0.6, the probability of the second real word "rice" appearing in the reply information is 0.2, and the probability of the second real word "noodle" appearing in the reply information is 0.2.
If the first keyword extracted from the second input information in the step S1012 is "eat", in this step, the second real word associated with "eat" is randomly acquired from the statistical library according to the probability distribution with "eat" as the first real word, and is used as the second keyword.
In the step of S1014, the second keyword is converted into a keyword input sequence, and each word of the second keyword may be converted into a word vector by using a preset dictionary, so as to obtain the keyword input sequence. The preset dictionary may be the same as the preset dictionary used for training the reply information generation model.
S1020: and coding the keyword input sequence by adopting a second coder to obtain a keyword hidden state vector, wherein the second coder is a coder based on an RNN model, and the keyword hidden state vector is a state value of the last step in a hidden layer of the second coder.
S1030: and horizontally connecting the keyword hidden state vector with the third decoding hidden state vector to obtain a second controlled hidden state vector.
In steps S1020 and S1030, the process of encoding the keyword input sequence by the second encoder is similar to the process of encoding the semantic guidance input sequence by the second encoder in the training process, and reference may be made to the description related to step S720 in the second embodiment; the process of horizontally connecting the keyword hidden state vector and the third decoding hidden state vector is similar to the process of horizontally connecting the semantic guidance hidden state vector and the first decoding hidden state vector in the training process, and reference may be made to the description related to step S730 in the second embodiment, which is not described herein again.
S801: and inputting the second input sequence into the reply information generation model for encoding and decoding by taking the second controlled hidden state vector as an initial value of a hidden layer of the first encoder to obtain a second output sequence.
Here, the second input sequence is obtained by converting second input information input by a user in a current round of session, and the reply information generation model is obtained by training in the method for training the reply information generation model according to any one of the first embodiment or the second embodiment.
In the step of S801, the process of the first encoder encoding the second input sequence is similar to that in the step of S800 in the third embodiment. The difference is that the third decoding hidden state vector is used as the initial value of the hidden layer of the first encoder in the step S800, and the second controlled hidden state vector is used as the initial value of the hidden layer of the first encoder in the step S801.
S900: and converting the second output sequence into a second reply message.
In this embodiment, the step S900 in the third embodiment may refer to the description of the step S900 in the third embodiment, and is not described herein again.
In this embodiment, through the steps of S1010, S1020, S1030, and S801, a second keyword preset by a user or a second keyword obtained according to a certain rule is introduced to guide generation of reply information, so that better and more accurate reply information can be obtained.
Referring to fig. 12, in a fifth embodiment of the present application, there is provided a training apparatus for a reply information generation model, including:
the device comprises an acquisition unit 1, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a first input sequence, and the first input sequence is obtained by converting input information of a current round of conversation in a training corpus;
a training unit 2, configured to use a first decoding hidden state vector as an initial value of a hidden layer of a first encoder, and encode the first input sequence with the first encoder to obtain a second encoding hidden state vector; decoding the second coding hidden state vector by adopting a decoder to obtain a first output sequence; calculating an error of a standard output sequence from the first output sequence; and updating parameters of the first encoder and the decoder according to the error if the error is above a preset end threshold; the first encoder is an RNN model-based encoder, the first decoding hidden state vector is a state value of the last step in a hidden layer of a decoder in the previous round of conversation, the decoder is an RNN model-based decoder, and the standard output sequence is obtained by converting reply information of the current round of conversation in the corpus.
Optionally, the obtaining unit 1 is further configured to obtain a semantic guidance input sequence, where the semantic guidance input sequence is obtained by converting a semantic guidance word, and the semantic guidance word is a word representing the semantics of the reply information of the current round in the training corpus;
the training unit 2 is further configured to encode the semantic guidance input sequence by using a second encoder to obtain a semantic guidance hidden state vector; horizontally connecting the semantic guidance hidden state vector with the first decoding hidden state vector to obtain a first controlled hidden state vector; and coding the first input sequence by using the first encoder to obtain a second coded hidden state vector by taking the first controlled hidden state vector as an initial value of a hidden layer of the first encoder; wherein the second encoder is an RNN model-based encoder, and the semantic guidance hidden state vector is a state value of a last step in a hidden layer of the second encoder.
Optionally, the obtaining unit 1 is further configured to obtain reply information of a current round of session in the corpus; performing word segmentation on the reply information; extracting semantic guide words from the word segmentation result; and converting the semantic guide words into a semantic guide input sequence.
Optionally, the
training unit 2 is further configured to calculate a decoding hidden state vector s at a jth time in a decoder
jSeparately from the semantic guidance implicit state vector h in the second encoder
0All encoded hidden state vectors { h } in the first encoder
1,...,h
i,...,h
nAttention assignment weight of
Wherein i is 0, 1, 2, …, n; j is 1, 2, …, m; n is the encoded hidden state vector h in the first decoder
iM is the output value y in the output sequence of the decoder
jThe total number of (c); calculation using softmax function
Obtaining a weighted average c
jWherein j is 0, 1, 2, …, m; and decoding the second encoded hidden state vector with a decoder to obtain a first output sequence { y }
1’,...,y
j’,...,y
m’},y
j’=g(y
j-1,s
j,c
j),s
j=f(y
j-1,s
j-1,c
j) Wherein f is a nonlinear activation function, g is a softmax function, y
j-1Is the input value, s, of the input layer of the decoder at the jth time instant
jThe decoded hidden state vector of the hidden layer of the decoder at the jth moment.
Optionally, the training unit 2 is further configured to determine, if the error is lower than or equal to a preset end threshold, current parameters of the first encoder and the decoder as parameters of a reply information generation model.
Referring to fig. 13, in a sixth embodiment of the present application, there is provided a reply information generating apparatus, including:
the generating unit 3 is used for inputting the second input sequence into the reply information generating model for encoding and decoding by taking the third decoding hidden state vector as an initial value of the first encoder hidden layer to obtain a second output sequence; the third decoding hidden state vector is a state value of the last step in a hidden layer of a decoder of a reply information generation model in the previous round of conversation, the second input sequence is obtained by converting second input information input by a user in the current round of conversation, and the reply information generation model is obtained by training by adopting any one of the training methods of the reply information generation model;
and the conversion unit 4 is configured to convert the second output sequence into second reply information.
Optionally, the generating unit 3 is further configured to obtain a keyword input sequence; coding the keyword input sequence by adopting a second coder to obtain a keyword hidden state vector; horizontally connecting the keyword hidden state vector with the third decoding hidden state vector to obtain a second controlled hidden state vector; inputting a second input sequence into the reply information generation model for encoding and decoding by taking a second controlled hidden state vector as an initial value of a hidden layer of the first encoder to obtain a second output sequence, wherein the keyword input sequence is obtained by converting a preset second keyword; the second encoder is an RNN model-based encoder, and the keyword hidden state vector is a state value of the last step in a hidden layer of the second encoder.
Optionally, the generating unit 3 is further configured to obtain second input information input by the user in the current round session; extracting a first keyword from the second input information; acquiring a second keyword associated with the first keyword from a preset statistical library; converting the second keyword into a keyword input sequence; the first keyword is a real word in the second input information, and the statistical library is constructed based on the input information and the reply information in the training corpus.
It should be noted that, in China, the skilled person does not perform unified chinese translation on "cell" and "word 2 vec" in "cell of RNN model", but refers to english original text to describe the cell. Therefore, in order to avoid ambiguity in translation, the present embodiment also uses english language, and those skilled in the art can understand these terms.
The same and similar parts in the various embodiments in this specification may be referred to each other. The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention.